From: Chuck Lever III <chuck.lever@oracle.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>,
Anna Schumaker <anna@kernel.org>,
Jeff Layton <jlayton@kernel.org>, Neil Brown <neilb@suse.de>,
Olga Kornievskaia <kolga@netapp.com>,
Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [Bug 218743] New: NFS-RDMA-Connected Regression Found on Upstream Linux 6.9-rc1
Date: Fri, 19 Apr 2024 15:19:53 +0000 [thread overview]
Message-ID: <D272E6E6-FBE8-4E86-A91D-B20F8D314FC1@oracle.com> (raw)
In-Reply-To: <20240419081520.57bf66c1@hermes.local>
> On Apr 19, 2024, at 11:15 AM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>
> I forward networking bugs to the maintainers.
> Netdev does not use bugzilla, not sure if NFS does.
>
> Begin forwarded message:
>
> Date: Thu, 18 Apr 2024 00:00:22 +0000
> From: bugzilla-daemon@kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 218743] New: NFS-RDMA-Connected Regression Found on Upstream Linux 6.9-rc1
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=218743
>
> Bug ID: 218743
> Summary: NFS-RDMA-Connected Regression Found on Upstream Linux
> 6.9-rc1
> Product: Networking
> Version: 2.5
> Kernel Version: 6.9-rc1
> Hardware: Intel
> OS: Linux
> Status: NEW
> Severity: high
> Priority: P3
> Component: Other
> Assignee: stephen@networkplumber.org
> Reporter: manuel.gomez@cornelisnetworks.com
> CC: dennis.dalessandro@cornelisnetworks.com
> Regression: Yes
> Bisected e084ee673c77cade06ab4c2e36b5624c82608b8c
> commit-id:
>
> On the Linux 6.9-rc1 kernel there is a performance regression for NFS file
> transfers when Connected IPoIB mode is enabled. The network switch is OPA
> (Omnipath Architecture).
>
> The most recent good commit in my bisection was the v6.8 mainline kernel
> (e8f897f4afef0031fe618a8e94127a0934896aba). Bisecting from v6.8 to v6.9-rc1
> showed me that "[e084ee673c77cade06ab4c2e36b5624c82608b8c] svcrdma: Add Write
> chunk WRs to the RPC's Send WR chain" was indeed the culprit of the regression.
>
>
> Here are the steps I ran to reproduce the issue:
> 1. Establish IPoIB Connected Mode on both client and host nodes:
> "echo connected > /sys/class/net/ibs785/mode"
>
>
> 2. Start an NFS server on the host node:
> "systemctl start opafm
> sleep 10
> systemctl start nfs-server
> modprobe svcrdma
> echo "rdma 20049" > /proc/fs/nfsd/portlist
> mkdir -p /mnt/nfs_test
> mount -t tmpfs -o size=4096M tmpfs /mnt/nfs_test
> sudo exportfs -o fsid=0,rw,async,insecure,no_root_squash
> 192.168.2.0/255.255.255.0:/mnt/nfs_test_testrun/"
>
>
> 3. Ready the client node:
> "mkdir -p /mnt/nfs_test
> mount -o rdma,port=20049 192.168.2.1:/mnt/nfs_test_testrun
> /mnt/nfs_test_testrun/"
>
>
> 4. Run the actual test from the client node:
> "
> #!/bin/bash
>
> fsize=268435456
> jfile=/dev/shm/run_nfs_test2.junk
> tfile=/dev/shm/run_nfs_test2.tmp
> nfsfile=/mnt/nfs_test_testrun/run_nfs_test2.junk
> rm -r -f /mnt/nfs_test_testrun/
> rm -f ${tfile}
> rm -f ${jfile}
>
> dd if=/dev/urandom iflag=fullblock of=${jfile} bs=1024 count=$((fsize/1024));
>
> for i in {1..100}; do
> cp ${jfile} ${nfsfile}; # Bottleneck 1
>
> cp ${nfsfile} ${tfile}; # Bottleneck 2
>
> cmp ${jfile} ${tfile};
>
> rm -f ${tfile};
> echo "DONE with iter ${i}"
> done;
>
> rm -f ${jfile};
> rm -f ${tfile};
> echo "Done";
> "
>
>
> On v6.8 I was seeing this test taking about 1m50s to complete, for all 10
> iterations. On v6.9-rc1 it takes 3-7 minutes, and I also see these kernel
> traces printed continuously in dmesg during this regression:
>
> [23720.243905] INFO: task kworker/61:1:556 blocked for more than 122 seconds.
> [23720.251709] Not tainted 6.9.0-rc1 #1
> [23720.256387] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> [23720.265268] task:kworker/61:1 state:D stack:0 pid:556 tgid:556
> ppid:2 flags:0x00004000
> [23720.275822] Workqueue: events __svc_rdma_free [rpcrdma]
> [23720.281803] Call Trace:
> [23720.284630] <TASK>
> [23720.287067] __schedule+0x210/0x660
> [23720.291063] schedule+0x2c/0xb0
> [23720.294668] schedule_timeout+0x146/0x160
> [23720.299249] __wait_for_common+0x92/0x1d0
> [23720.303828] ? __pfx_schedule_timeout+0x10/0x10
> [23720.308987] __ib_drain_sq+0xfa/0x170 [ib_core]
> [23720.314190] ? __pfx_ib_drain_qp_done+0x10/0x10 [ib_core]
> [23720.320343] ib_drain_qp+0x71/0x80 [ib_core]
> [23720.325232] __svc_rdma_free+0x28/0x100 [rpcrdma]
> [23720.330604] process_one_work+0x196/0x3d0
> [23720.335185] worker_thread+0x2fc/0x410
> [23720.339470] ? __pfx_worker_thread+0x10/0x10
> [23720.344336] kthread+0xdf/0x110
> [23720.347941] ? __pfx_kthread+0x10/0x10
> [23720.352225] ret_from_fork+0x30/0x50
> [23720.356317] ? __pfx_kthread+0x10/0x10
> [23720.360602] ret_from_fork_asm+0x1a/0x30
> [23720.365083] </TASK>
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are the assignee for the bug.
>
Thanks, I've seen a performance regression on one system and
haven't been able to reproduce it elsewhere.
Please move this bug to Filesystems/NFS.
--
Chuck Lever
prev parent reply other threads:[~2024-04-19 15:20 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-19 15:15 Fw: [Bug 218743] New: NFS-RDMA-Connected Regression Found on Upstream Linux 6.9-rc1 Stephen Hemminger
2024-04-19 15:19 ` Chuck Lever III [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D272E6E6-FBE8-4E86-A91D-B20F8D314FC1@oracle.com \
--to=chuck.lever@oracle.com \
--cc=anna@kernel.org \
--cc=dai.ngo@oracle.com \
--cc=jlayton@kernel.org \
--cc=kolga@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=stephen@networkplumber.org \
--cc=tom@talpey.com \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).