v9fs.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Matthieu Baerts <matthieu.baerts@tessares.net>
To: Eric Van Hensbergen <ericvh@kernel.org>
Cc: v9fs@lists.linux.dev, MPTCP Upstream <mptcp@lists.linux.dev>,
	regressions@lists.linux.dev
Subject: 9p: MPTCP tests regressions due to new 9p features in v6.4
Date: Tue, 6 Jun 2023 16:30:43 +0200	[thread overview]
Message-ID: <855a232a-76d3-7e7b-b2b5-2ebc41bcadd6@tessares.net> (raw)

Hi Eric and other 9p devs,

TL;DR: it looks like there is a (small?) problem with the new 9p
features you recently sent and it causes MPTCP tests to be unstable. It
is tracked there: https://github.com/multipath-tcp/mptcp_net-next/issues/400


First, thank you very much for maintaining this very useful FS!

For the MPTCP subsystem, we are running various tests. Many are ran by a
public CI [1] in a VM by using QEmu with 9p thanks to Virtme [2].
Virtme, its dependences and a script to compile the kernel and run the
tests are "packaged" in a Docker container which eases the deployment
and the reproduction of issues using the same environment [3].


Since v6.4-rc1, we noticed that our public CI was reporting various
instabilities, mainly with Packetdrill tests. Packetdrill [4] allows us
to easily call some system calls, craft and "inject" some network
packets and verify that the kernel generates the expected ones. It was
difficult to reproduce the issue but after investigations, it looks like
since v6.4-rc1, Packetdrill is slower to craft and inject packets,
causing the kernel to retransmit packets (retransmission timeout) that
are not expected and tests are then marked as failed. For more details,
a ticket [5] has been opened in our public bugs tracker.

I ran a 'git bisect' and found out that it seems to be caused by the new
9p features for 6.4:

  8e15605be8ba ("Merge tag '9p-6.4-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs")


If I revert these commits on top of our "export" branch, I can no longer
reproduce the bug:

  $ git log --oneline --reverse v6.3-rc1..21e26d5e54ab
  d9bc0d11e33b fs/9p: Consolidate file operations and add readahead and
writeback
  740b8bf87322 fs/9p: Remove unnecessary superblock flags
  8142db4f2792 fs/9p: allow disable of xattr support on mount
  46c30cb8f539 9p: Add additional debug flags and open modes
  6deffc8924b5 fs/9p: Add new mount modes
  1543b4c5071c fs/9p: remove writeback fid and fix per-file modes
  4eb3117888a9 fs/9p: Rework cache modes and add new options to
Documentation
  21e26d5e54ab fs/9p: Fix bit operation logic error

'git bisect' seems to suggest the issue is due to the first commit
d9bc0d11e33b ("fs/9p: Consolidate file operations and add readahead and
writeback").

It is not clear why these modifications are causing such issues because
the Packetdrill tests are not doing intensive read and no write on the
disk (only stdout). Not so much info has to be read from the disk: one
small Python script (255 lines [6]) launches multiple threads, each
executing the same bash script (22 lines [7]) launching the same
"packetdrill" binary (1.1 MB) in a newly created and dedicated netns.
Then each test reads a different test script of ~30 lines, e.g. [8]. I
think most of the time the problem is seen at the end of a test. From
what I see [9], Packetdrill tries to read the whole test script. This is
confirmed by quickly looking at the output of strace available on the
ticket [5].


It is unclear to me what else I can do and share with you to help fixing
this "regression". I can quite easily reproduce the issue on my side and
provide more info if needed.

Just in case you wish to reproduce it on your side using our docker
container [3], it is easy:

  $ cd [kernel source code]
  $ cat <<'EOF' > .virtme-exec-run
for z in $(seq 15); do
  run_packetdrill_one mptcp/dss || true
  grep -q "[l]ive packet payload:" "${OUTPUT_VIRTME}" && break
  rm "${RESULTS_DIR}"/*.tap; done
EOF
  $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

This will compile the kernel in ".virtme" dir with the required .config.
The VM will start and run a category of packetdrill tests 15 times. On
my side, I was able to reproduce the issue with a busy machine and
packetdrill had to run ~5 times. To stress my machine, I ran stress-ng
in parallel (outside the VM) when the tests were being executed (after
the compilation not to slow down everything), e.g. with

  nproc2=$(nproc); nproc2=$((nproc2 * 2))
  stress-ng --cpu "${nproc2}" --io "${nproc2}" --vm "${nproc2}" \
            --vm-bytes 1G --timeout 60m


I hope you don't mind if I cc the regression ML: this is probably not an
important regression but I would not have guessed the issues we had when
running network tests were due to 9p, this report might help others with
similar issues :)

#regzbot introduced: d9bc0d11e33b


Cheers,
Matt

[1] https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net
[2] https://github.com/amluto/virtme
[3] https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
[4] https://github.com/google/packetdrill/
[5] https://github.com/multipath-tcp/mptcp_net-next/issues/400
[6]
https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/packetdrill/run_all.py
[7]
https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/packetdrill/in_netns.sh
[8]
https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/dss/mpc_with_data_client.pkt
[9]
https://github.com/google/packetdrill/blob/master/gtests/net/packetdrill/parser.y#L168
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

             reply	other threads:[~2023-06-06 14:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-06 14:30 Matthieu Baerts [this message]
     [not found] ` <CAFkjPTnX0=-GK8sFzd4S+V2+cA8E-FAqYHNndZui2Sh_MvoHPw@mail.gmail.com>
2023-06-06 15:08   ` 9p: MPTCP tests regressions due to new 9p features in v6.4 Matthieu Baerts
2023-06-06 18:47     ` evanhensbergen
2023-06-13 14:56       ` Linux regression tracking (Thorsten Leemhuis)
2023-06-13 16:07         ` Eric Van Hensbergen
2023-06-13 16:27           ` Matthieu Baerts
2023-06-22  7:53             ` Linux regression tracking (Thorsten Leemhuis)
2023-06-29 14:29               ` Linux regression tracking #update (Thorsten Leemhuis)
2023-08-07 15:42           ` Matthieu Baerts
2023-08-07 15:56             ` Eric Van Hensbergen
2023-06-29  8:46 ` Eric Van Hensbergen
2023-06-29 13:26   ` Matthieu Baerts
2023-06-29 15:17     ` Eric Van Hensbergen
2023-06-29 16:41       ` Matthieu Baerts
2023-06-29 17:04         ` Eric Van Hensbergen
2023-06-29 17:25           ` Matthieu Baerts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=855a232a-76d3-7e7b-b2b5-2ebc41bcadd6@tessares.net \
    --to=matthieu.baerts@tessares.net \
    --cc=ericvh@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=regressions@lists.linux.dev \
    --cc=v9fs@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).