From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2EF4C54E58 for ; Thu, 21 Mar 2024 06:36:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2E85D6B0089; Thu, 21 Mar 2024 02:36:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 299036B008A; Thu, 21 Mar 2024 02:36:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 160836B008C; Thu, 21 Mar 2024 02:36:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 03D496B0089 for ; Thu, 21 Mar 2024 02:36:31 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CB8C1A08EE for ; Thu, 21 Mar 2024 06:36:30 +0000 (UTC) X-FDA: 81920087340.05.D521C43 Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) by imf16.hostedemail.com (Postfix) with ESMTP id 1C14118000F for ; Thu, 21 Mar 2024 06:36:27 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=CBt4oge7; spf=pass (imf16.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.208.173 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711002989; a=rsa-sha256; cv=none; b=SzuvNiPPrs8V/l0q0U3x5y+SPI2QP8GajXMsfBLJzugYaPpIPk8jSxkY3ZqhT3zZbwJ7I6 6Q5ZwYcCvN1IGR8QULeMoaSxm+wfTxvXo6usPdYpSOXuvDnL/eGMqTELGlbWrrqQXCYQCN NJ15Z3/k7HxMkbJtpwLQQuz6D4+hP/E= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=CBt4oge7; spf=pass (imf16.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.208.173 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711002989; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0WH+LoKgBOqplZsy19XgcneVOYeRtFJ9OJ7R6rGrIZM=; b=e4oVYZWOsrZUUxnQjqiYm5nexImHzmTti8RmnezBLuRRdYHu8EUVWoD7AYZTxyhADzZAP9 03el42faEqsd1BrJMjIs5eP3ApbybgyznSMDJ8b+Uz8d7wn1mY6o5+DNU4O8iIdKTmo3Uu /ikz+i25tnafeANO5HZgz2lrk9/8jyk= Received: by mail-lj1-f173.google.com with SMTP id 38308e7fff4ca-2d29aad15a5so7980031fa.3 for ; Wed, 20 Mar 2024 23:36:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1711002986; x=1711607786; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0WH+LoKgBOqplZsy19XgcneVOYeRtFJ9OJ7R6rGrIZM=; b=CBt4oge75tax28n1nr3Tc2+EUOdsJfBBGAf2vuXaRjFUcEO1ZmTcyOMLpKkAXKfp1b F/Rg8z1Ppd3+BCBfMGM181o5EgmH3pgpEYCL7uOx2r7/aF0U3sYGD6Taq/T2XLjw+XuK dZgvbhxiAH4UP9MfGxrw4PlMipjmc6erIWdFBYn8NBdm5YClRb5wFlfSS3bcqWpanETE Kghl3VZNXS1z6tGwGrvTEsN+Y44B+3FjhLZ0Qk7mOvDFPci9lI+nswGyhIT9puMyvUYZ PMQt2XHOVtK00XEPZDph14KyLUrb6/Jx8MkGsJE8OMl7fV9gIZJCH65xWigA8Zbu58/Y h5dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711002986; x=1711607786; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0WH+LoKgBOqplZsy19XgcneVOYeRtFJ9OJ7R6rGrIZM=; b=PVUeHiATn1V9rQO+6D4jNR38kI3Kik7B+i89MRUrdIjK8Lckyr16l0OYfHrCbBmfwm B79tCmegXoKh+E7T3PHztuxiXScIrO1QAioDOR4olGRwze4ifGi99hYDxEEajkfArnZq +bEdcyxLCquTuC9CWLzz/LePSUisxX5xFNaK2EOMpEDdp2OrSSbHd/2AgogdISWyX3Vj cs1FELbax8KeXvRB+YBVoMogMMghDnoaSZMPGpiwBe8zbNOsqDpbW095kOLnFl3u0uHk A4uIGu252OqKCwCzQ1ZPanUXdsOS9XVL4Awjl+4Div+I27PCrHK0KYlbdf/o3euF1B7w /Kig== X-Forwarded-Encrypted: i=1; AJvYcCX5oKvGjlnkJ1ike0fA7Z0vVqJNbwHAhqSqrN0HNLli+m21AA3eaUuQmgz/Q2ZHyJ9nrHIQZbyK4yI26lusNxLgfR4= X-Gm-Message-State: AOJu0Yw5Kzt32TZGuEga2VEgJdGKTFuRSuLF0pfYwr/Vp+4v0/oXR4ts WBrrv+idmvhnWZtxT/NYh9O4SXkwfHVVY4US6GwFnbF3xn4TQ3P0jQO6v5Nzvn9esQHP64Nx/1j GrmeyMwDPa9+RHlMXrIXylbn8Md/An/JSoHpMNbhPnKyiz1qG7uqsSQ== X-Google-Smtp-Source: AGHT+IH043+KKGM9ioMbltfKskMYC1A3dFjbtulUjfQQ7K8ewJg/6lOXo5fPm2ZUNQKJt58hBmB5XqaO++dSRBUbN88= X-Received: by 2002:a2e:9890:0:b0:2d4:5c0c:77db with SMTP id b16-20020a2e9890000000b002d45c0c77dbmr645015ljj.3.1711002986276; Wed, 20 Mar 2024 23:36:26 -0700 (PDT) MIME-Version: 1.0 References: <01b0b8e8-af1d-4fbe-951e-278e882283fd@linux.dev> In-Reply-To: From: Zhongkun He Date: Thu, 21 Mar 2024 14:36:14 +0800 Message-ID: Subject: Re: [External] Re: [bug report] mm/zswap :memory corruption after zswap_load(). To: Chengming Zhou Cc: Johannes Weiner , Yosry Ahmed , Andrew Morton , linux-mm , wuyun.abel@bytedance.com, zhouchengming@bytedance.com, Nhat Pham Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 1C14118000F X-Stat-Signature: a1njbfwddics7jp97zcoa947bqa9c8jz X-Rspam-User: X-HE-Tag: 1711002987-444781 X-HE-Meta: U2FsdGVkX1/MOWVVZXap8oScf2L/A/UcSejyBGW4+sQbskzRPUPzJtbBmxRM+otsspro1mmNM/WITdNvHeNjUl9pOlwwnQ6ZVjAYS/K3iy1NV1x8QWy/io7CJsU4W7uf0XBvN38hCpkhK0KqfsbvBLfdNP9zjWhjmoyezJqay9HIGtC4fLj04CDfmBr6i1yhhbjdp63Qnu/VPNkIvtX7+Sc54aQKsf0+B8rQfnp7kMtQiL4wzJV4QuJWbuTMELWdKufxMsOb5lKi69cPYyTjbMZQl9bYT6h+A9E/v/CREn1O32eYuCkm3tty+YDYnfJbZyFYYF1vK+OAwg9SdFSQSjeDtCP+Rx3ziiBeHBftSLuin3i26zpP6fatIy0qG9ofZ1Lzl28vgMU6mdrDkl9Bz1wirNHSflkqVRcA60MfTZJyoz9pr5K/5tVdwe4jiJdocr+tcIF23W8vjtZZDWLN8WYVqtFhgAaJUy4QPscYmOEDj5GpafxDck9URdbZr2BeHdvn/iPK45RUNsGXxTSV1vyaYomdHt22d7S9fUlUR/8UuEnof/ogEIr8IsEn+2reUH5AylXP+d9fXL1L0X2bzz41h4h1Y4boX1iKoyacPb970SwNIEX9I1Mdp/7dQk8i+Ex9zZmBxTT4GhRGS51Qil7pQv93EdBoLlKhvtdo6hnW40WYsr8IWJQZulh+RdGmeP1+vsoQirFY7RGaAoPiy6UMffNH7VCOinOfTnU4jpQjfuLQSP3oWxd74CNEISHRbABjBhjlILllOLiK0z0B9TjKNWSf7qJTmhjuxuMu4lu96IxW30vZqGUlXcD5cYjXhXYZaLOh/BdZV8+0zbnxLjz8IX0QeiFY36OQgh53/v5BnHpHMdxpakWmHrhngAwVFsK6jst7WbdroQpP12mwD7zwXEpC1r3EDGadF4HL/icw6lvnPc8/L4Ouh+BmcqWT3RLg+vAClR6zgN/L9FJ EaELNIq1 rPrKNGHukFrPmiyeMMOnRfixCP3gnll21hG0a4wVnaLIoFgOwuXGEOnyvuA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 21, 2024 at 1:24=E2=80=AFPM Chengming Zhou wrote: > > On 2024/3/21 13:09, Zhongkun He wrote: > > On Thu, Mar 21, 2024 at 12:42=E2=80=AFPM Chengming Zhou > > wrote: > >> > >> On 2024/3/21 12:34, Zhongkun He wrote: > >>> Hey folks, > >>> > >>> Recently, I tested the zswap with memory reclaiming in the mainline > >>> (6.8) and found a memory corruption issue related to exclusive loads. > >> > >> Is this fix included? 13ddaf26be32 ("mm/swap: fix race when skipping s= wapcache") > >> This fix avoids concurrent swapin using the same swap entry. > >> > > > > Yes, This fix avoids concurrent swapin from different cpu, but the > > reported issue occurs > > on the same cpu. > > I think you may misunderstand the race description in this fix changelog, > the CPU0 and CPU1 just mean two concurrent threads, not real two CPUs. > > Could you verify if the problem still exists with this fix? Yes=EF=BC=8CI'm sure the problem still exists with this patch. There is some debug info, not mainline. bpftrace -e'k:swap_readpage {printf("%lld, %lld,%ld,%ld,%ld\n%s", ((struct page *)arg0)->private,nsecs,tid,pid,cpu,kstack)}' --include linux/mm_types.h offset nsecs tid pid cpu 2140659, 595771411052,15045,15045,6 swap_readpage+1 do_swap_page+2135 handle_mm_fault+2426 do_user_addr_fault+462 do_page_fault+48 async_page_fault+62 offset nsecs tid pid cpu 2140659, 595771424445,15045,15045,6 swap_readpage+1 do_swap_page+2135 handle_mm_fault+2426 do_user_addr_fault+462 do_page_fault+48 async_page_fault+62 ------------------------------- There are two page faults with the same tid and offset in 13393 nsecs. > > > > > Thanks. > > > >> Thanks. > >> > >>> > >>> > >>> root@**:/sys/fs/cgroup/zz# stress --vm 5 --vm-bytes 1g --vm-hang 3 --= vm-keep > >>> stress: info: [31753] dispatching hogs: 0 cpu, 0 io, 5 vm, 0 hdd > >>> stress: FAIL: [31758] (522) memory corruption at: 0x7f347ed1a010 > >>> stress: FAIL: [31753] (394) <-- worker 31758 returned error 1 > >>> stress: WARN: [31753] (396) now reaping child worker processes > >>> stress: FAIL: [31753] (451) failed run completed in 14s > >>> > >>> > >>> 1. Test step(the frequency of memory reclaiming has been accelerated)= : > >>> ------------------------- > >>> a. set up the zswap, zram and cgroup V2 > >>> b. echo 0 > /sys/kernel/mm/lru_gen/enabled > >>> (Increase the probability of problems occurring) > >>> c. mkdir /sys/fs/cgroup/zz > >>> echo $$ > /sys/fs/cgroup/zz/cgroup.procs > >>> cd /sys/fs/cgroup/zz/ > >>> stress --vm 5 --vm-bytes 1g --vm-hang 3 --vm-keep > >>> > >>> e. in other shell: > >>> while :;do for i in {1..5};do echo 20g > > >>> /sys/fs/cgroup/zz/memory.reclaim & done;sleep 1;done > >>> > >>> 2. Root cause: > >>> -------------------------- > >>> With a small probability, the page fault will occur twice with the > >>> original pte, even if a new pte has been successfully set. > >>> Unfortunately, zswap_entry has been released during the first page fa= ult > >>> with exclusive loads, so zswap_load will fail, and there is no corres= ponding > >>> data in swap space, memory corruption occurs. > >>> > >>> bpftrace -e'k:zswap_load {printf("%lld, %lld\n", ((struct page > >>> *)arg0)->private,nsecs)}' > >>> --include linux/mm_types.h > a.txt > >>> > >>> look up the same index: > >>> > >>> index nsecs > >>> 1318876, 8976040736819 > >>> 1318876, 8976040746078 > >>> > >>> 4123110, 8976234682970 > >>> 4123110, 8976234689736 > >>> > >>> 2268896, 8976660124792 > >>> 2268896, 8976660130607 > >>> > >>> 4634105, 8976662117938 > >>> 4634105, 8976662127596 > >>> > >>> 3. Solution > >>> > >>> Should we free zswap_entry in batches so that zswap_entry will be > >>> valid when the next page fault occurs with the > >>> original pte? It would be great if there are other better solutions. > >>> > >>