From: "NeilBrown" <neilb@suse.de>
To: "Lex Siegel" <usiegl00@gmail.com>
Cc: "Chuck Lever" <chuck.lever@oracle.com>,
"Jeff Layton" <jlayton@kernel.org>,
"Olga Kornievskaia" <kolga@netapp.com>,
"Dai Ngo" <Dai.Ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>,
"Trond Myklebust" <trond.myklebust@hammerspace.com>,
"Anna Schumaker" <anna@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
linux-nfs@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH] xprtsock: Fix a loop in xs_tcp_setup_socket()
Date: Mon, 22 Apr 2024 13:44:45 +1000 [thread overview]
Message-ID: <171375748540.7600.5672163982570379489@noble.neil.brown.name> (raw)
In-Reply-To: <CAHCWhjScokCi7u_98-i6E_xHaSJnFGY6dnkv9-C5-yrpihVJFg@mail.gmail.com>
On Mon, 22 Apr 2024, Lex Siegel wrote:
> > Better still would be for kernel_connect() to return a more normal error
> > code - not EPERM. If that cannot be achieved, then I think it would be
> > best for the sunrpc code to map EPERM to something else at the place
> > where kernel_connect() is called - catch it early.
>
> The question is whether a permission error, EPERM, should cause a retry or
> return. Currently xs_tcp_setup_socket() is retrying. For the retry to clear,
> the connect call will have to not return a permission error to halt the retry
> attempts.
>
> This is a default behavior because EPERM is not an explicit case of the switch
> statement. Because bpf appropriately uses EPERM to show that the kernel_connect
> was not permitted, it highlights the return handling for this case is missing.
> It is unlikely that retry was ever the intended result.
>
> Upstream, the bpf that caused this is at:
> https://github.com/cilium/cilium/blob/v1.15/bpf/bpf_sock.c#L336
>
> This cilium bpf code has two return statuses, EPERM and ENXIO, that fall
> through to the default case of retrying. Here, cilium expects both of these
> statuses to indicate the connect failed. A retry is not the intended result.
>
> Handling this case without a retry aligns this code with the udp behavior. This
> precedence for passing EPERM back up the stack was set in 3dedbb5ca10ef.
>
> I will amend my patch to include an explicit case for ENXIO as well, as this is
> also in cilium's bpf and will cause the same bug to occur.
>
I think it should be up to cilium to report an errno that the kernel
understands, not up to the kernel to understand whatever errno cilium
chooses to return.
I don't think EPERM or ENXIO are appropriate errors for network
problems.
EHOSTUNREACH or ECONNREFUSED would make much more sense.
NeilBrown
>
> On Mon, Apr 22, 2024 at 8:22 AM NeilBrown <neilb@suse.de> wrote:
> >
> > On Sat, 20 Apr 2024, Lex Siegel wrote:
> > > When using a bpf on kernel_connect(), the call can return -EPERM.
> > > This causes xs_tcp_setup_socket() to loop forever, filling up the
> > > syslog and causing the kernel to freeze up.
> > >
> > > Signed-off-by: Lex Siegel <usiegl00@gmail.com>
> > > ---
> > > net/sunrpc/xprtsock.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > index bb9b747d58a1..47b254806a08 100644
> > > --- a/net/sunrpc/xprtsock.c
> > > +++ b/net/sunrpc/xprtsock.c
> > > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> > > /* Happens, for instance, if the user specified a link
> > > * local IPv6 address without a scope-id.
> > > */
> > > + case -EPERM:
> > > + /* Happens, for instance, if a bpf is preventing the connect */
> >
> > This will propagate -EPERM up into other layers which might not be ready
> > to handle it.
> > It might be safer to map EPERM to an error we would be more likely to
> > expect from the network system - such as ECONNREFUSED or ENETDOWN.
> >
> > Better still would be for kernel_connect() to return a more normal error
> > code - not EPERM. If that cannot be achieved, then I think it would be
> > best for the sunrpc code to map EPERM to something else at the place
> > where kernel_connect() is called - catch it early.
> >
> > NeilBrown
> >
> >
> > > case -ECONNREFUSED:
> > > case -ECONNRESET:
> > > case -ENETDOWN:
> > > --
> > > 2.39.3
> > >
> > >
> >
>
prev parent reply other threads:[~2024-04-22 3:45 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-20 10:48 [PATCH] xprtsock: Fix a loop in xs_tcp_setup_socket() Lex Siegel
2024-04-20 11:06 ` Jeff Layton
2024-04-21 22:32 ` Trond Myklebust
2024-04-21 23:22 ` NeilBrown
2024-04-22 3:32 ` Lex Siegel
2024-04-22 3:44 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=171375748540.7600.5672163982570379489@noble.neil.brown.name \
--to=neilb@suse.de \
--cc=Dai.Ngo@oracle.com \
--cc=anna@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=chuck.lever@oracle.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jlayton@kernel.org \
--cc=kolga@netapp.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=tom@talpey.com \
--cc=trond.myklebust@hammerspace.com \
--cc=usiegl00@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).