MPTCP Archive mirror
 help / color / mirror / Atom feed
From: Geliang Tang <geliang@kernel.org>
To: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf@vger.kernel.org, mptcp@lists.linux.dev,
	Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Mykola Lysenko <mykolal@fb.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>
Subject: Re: [PATCH bpf-next] selftests/bpf: Handle EAGAIN in bpf_tcp_ca
Date: Tue, 02 Apr 2024 18:53:10 +0800	[thread overview]
Message-ID: <b3943f9a8bf595212b00e96ba850bf32893312cc.camel@kernel.org> (raw)
In-Reply-To: <4cb1511c-c623-497f-818e-a4d4614548ed@linux.dev>

Hi Martin,

On Fri, 2024-03-29 at 10:27 -0700, Martin KaFai Lau wrote:
> > > > On 3/28/24 10:57 PM, Geliang Tang wrote:
> > > > > > > > Hi Martin,
> > > > > > > > 
> > > > > > > > On Thu, 2024-03-28 at 09:55 -0700, Martin KaFai Lau
> > > > > > > > wrote:
> > > > > > > > > > > > On 3/28/24 3:23 AM, Geliang Tang wrote:
> > > > > > > > > > > > > > > > From: Geliang Tang
> > > > > > > > > > > > > > > > <tanggeliang@kylinos.cn>
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > bpf_tcp_ca tests may emit EAGAIN
> > > > > > > > > > > > > > > > sometimes. In that
> > > > > > > > > > > > > > > > case, tests
> > > > > > > > > > > > > > > > fail with
> > > > > > > > > > > > > > > > "bytes != total_bytes" errors. Sending
> > > > > > > > > > > > > > > > should continue,
> > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > break
> > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > errno is EAGAIN. This patch can make
> > > > > > > > > > > > > > > > bpf_tcp_ca tests
> > > > > > > > > > > > > > > > stable.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Signed-off-by: Geliang Tang
> > > > > > > > > > > > > > > > <tanggeliang@kylinos.cn>
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >   
> > > > > > > > > > > > > > > > tools/testing/selftests/bpf/prog_tests/
> > > > > > > > > > > > > > > > bpf_tcp_ca.c
> > > > > > > > > > > > > > > > | 4 ++--
> > > > > > > > > > > > > > > >    1 file changed, 2 insertions(+), 2
> > > > > > > > > > > > > > > > deletions(-)
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > > a/tools/testing/selftests/bpf/prog_test
> > > > > > > > > > > > > > > > s/bpf_tcp_ca.c
> > > > > > > > > > > > > > > > b/tools/testing/selftests/bpf/prog_test
> > > > > > > > > > > > > > > > s/bpf_tcp_ca.c
> > > > > > > > > > > > > > > > index 077b107130f6..fbc219c2d53b 100644
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > a/tools/testing/selftests/bpf/prog_test
> > > > > > > > > > > > > > > > s/bpf_tcp_ca.c
> > > > > > > > > > > > > > > > +++
> > > > > > > > > > > > > > > > b/tools/testing/selftests/bpf/prog_test
> > > > > > > > > > > > > > > > s/bpf_tcp_ca.c
> > > > > > > > > > > > > > > > @@ -56,7 +56,7 @@ static void
> > > > > > > > > > > > > > > > *server(void *arg)
> > > > > > > > > > > > > > > >     while (bytes < total_bytes &&
> > > > > > > > > > > > > > > > !READ_ONCE(stop)) {
> > > > > > > > > > > > > > > >     nr_sent = send(fd, &batch,
> > > > > > > > > > > > > > > >            MIN(total_bytes - bytes,
> > > > > > > > > > > > > > > > sizeof(batch)), 0);
> > > > > > > > > > > > > > > > - if (nr_sent == -1 && errno == EINTR)
> > > > > > > > > > > > > > > > + if (nr_sent == -1 && (errno == EINTR
> > > > > > > > > > > > > > > > || errno ==
> > > > > > > > > > > > > > > > EAGAIN))
> > > > > > > > > > > > 
> > > > > > > > > > > > This is a non blocking socket. EAGAIN is
> > > > > > > > > > > > hitting the
> > > > > > > > > > > > timeout
> > > > > > > > > > > > situation?
> > > > > > > > > > > > 
> > > > > > > > > > > > The default timeout is 3s and it has not been
> > > > > > > > > > > > changed after
> > > > > > > > > > > > the
> > > > > > > > > > > > recent
> > > > > > > > > > > > connect_fd_to_fd and start_server
> > > > > > > > > > > > simplifications. I don't
> > > > > > > > > > > > find
> > > > > > > > > > > > bpf
> > > > > > > > > > > > CI failing
> > > > > > > > > > > > in this test in the last month also.
> > > > > > > > > > > > 
> > > > > > > > > > > > I would prefer to fail after timeout instead of
> > > > > > > > > > > > keep
> > > > > > > > > > > > retrying. Do
> > > > > > > > > > > > you
> > > > > > > > > > > > really hit
> > > > > > > > > > > > that in your environment for this specific
> > > > > > > > > > > > bpf_tcp_ca test?
> > > > > > > > > > > > There
> > > > > > > > > > > > are
> > > > > > > > > > > > many tests
> > > > > > > > > > > > using this timeout value also.
> > > > > > > > 
> > > > > > > > This is the 2nd patch of "refactor mptcp bpf tests"
> > > > > > > > series:
> > > > > > > > 
> > > > > > > > https://patchwork.kernel.org/project/mptcp/cover/cover.1711688054.git.tanggeliang@kylinos.cn/
> > > > > > > >  >
> > > > > > > > I didn't get the mentioned EAGAIN errors in bpf_tcp_ca
> > > > > > > > tests,
> > > > > > > > but
> > > > > > > > got
> > > > > > > > them in MPTCP BPF sched tests (see patch 1). MPTCP BPF
> > > > > > > > sched
> > > > > > > > tests
> > > > > > > > (not
> > > > > > > > upstream yet) use the same sending and receiving
> > > > > > > > functions as
> > > > > > > > bpf_tcp_ca tests (patch 15). So it makes sense to add
> > > > > > > > this fix
> > > > > > > > for
> > > > > > > > bpf_tcp_ca tests too.
> > > > 
> > > > It sounds like the EAGAIN is specific to mptcp sched test and
> > > > is
> > > > not
> > > > due to a 
> > > > timeout? Did you try to increase the timeout and see if it
> > > > resolves
> > > > the issue?
> > > > 
> > > > afaik, there is no fix needed for the bpf_tcp_ca test. bpf CI
> > > > expects
> > > > the test 
> > > > to finish. If the default 3s turns out to be too flaky for most
> > > > tests, it could 
> > > > be increased instead of loop testing EAGAIN because the bpf CI
> > > > expects the test 
> > > > to finish. However, so far I don't see this (i.e. 3s is too
> > > > short)
> > > > to
> > > > be the 
> > > > case from looking at one month old data in bpf CI.

Thanks for your suggestions. I finally fixed this issue by explicitly
setting the server sockets with nonblock flags in MPTCP BPF tests. Then
EAGAIN will not appear in the tests anymore. The fix is here:

https://patchwork.kernel.org/project/mptcp/patch/f0b66813ae8274e5653988d80d16171508f05796.1712042049.git.tanggeliang@kylinos.cn/

That means this patch "selftests/bpf: Handle EAGAIN in bpf_tcp_ca" can
be dropped now.

> > > > > > > > And here's another reason. I want to move these
> > > > > > > > functions from
> > > > > > > > bpf_tcp_ca into network_helpers as public ones (patch
> > > > > > > > 4), which
> > > > > > > > can
> > > > > > > > be
> > > > > > > > used by both bpf_tcp_ca and MPTCP BPF sched tests. So
> > > > > > > > we must
> > > > > > > > add
> > > > > > > > this
> > > > > > > > fix to the public ones too.
> > > > 
> > > > Lets understand why mptcp sched test hits EAGAIN first.

The patches "export send_byte and send_recv_data into network_helpers"
have been sent to bpf-next, please review them when you have time:

https://patchwork.kernel.org/project/netdevbpf/cover/cover.1712039441.git.tanggeliang@kylinos.cn/


> > > > > > > > Maybe the commit log of this patch needs to be updated.
> > > > > > > > Or I
> > > > > > > > should
> > > > > > > > send patches 2, 3 and 4 together to bpf-next?
> > > > 
> > > > All selftests/bpf changes have to pass bpf CI. It is always a
> > > > good
> > > > idea to 
> > > > target bpf-next to kick off the bpf CI to bar any surprise
> > > > later
> > > > when
> > > > it got 
> > > > merged into bpf-next.
> > > > 
> > > > Unrelated, since we are in the mptcp bpf test, one thing needs
> > > > to
> > > > fix
> > > > in 
> > > > test_mptcpify(). The mptcpify test is upgrading a IPPROTO_TCP
> > > > socket
> > > > to 
> > > > IPPROTO_MPTCP socket. This could break other tests when the
> > > > test_progs is run in 
> > > > parallel (test_progs "-j" which fork() to run prog_tests/* in
> > > > parallel). It 
> > > > could unexpectedly upgrade the tcp socket of another test. One
> > > > option
> > > > is to 
> > > > limit the upgrade by checking the pid in the mptcpify.c bpf
> > > > prog.

I just sent a patch named "selftests/bpf: Add pid limit for mptcpify
prog" to fix this and added your "suggested-by" tag in it.

Thanks,
-Geliang

> > > > 
> > > > 


      reply	other threads:[~2024-04-02 10:53 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 10:23 [PATCH bpf-next] selftests/bpf: Handle EAGAIN in bpf_tcp_ca Geliang Tang
2024-03-28 11:16 ` MPTCP CI
2024-03-28 16:55 ` Martin KaFai Lau
2024-03-29  5:57   ` Geliang Tang
2024-03-29 17:27     ` Martin KaFai Lau
2024-04-02 10:53       ` Geliang Tang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b3943f9a8bf595212b00e96ba850bf32893312cc.camel@kernel.org \
    --to=geliang@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mptcp@lists.linux.dev \
    --cc=mykolal@fb.com \
    --cc=sdf@google.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).