Linux-NFS Archive mirror
 help / color / mirror / Atom feed
From: "NeilBrown" <neilb@suse.de>
To: "Lex Siegel" <usiegl00@gmail.com>
Cc: "Chuck Lever" <chuck.lever@oracle.com>,
	"Jeff Layton" <jlayton@kernel.org>,
	"Olga Kornievskaia" <kolga@netapp.com>,
	"Dai Ngo" <Dai.Ngo@oracle.com>, "Tom Talpey" <tom@talpey.com>,
	"Trond Myklebust" <trond.myklebust@hammerspace.com>,
	"Anna Schumaker" <anna@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	linux-nfs@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH] xprtsock: Fix a loop in xs_tcp_setup_socket()
Date: Mon, 22 Apr 2024 13:44:45 +1000	[thread overview]
Message-ID: <171375748540.7600.5672163982570379489@noble.neil.brown.name> (raw)
In-Reply-To: <CAHCWhjScokCi7u_98-i6E_xHaSJnFGY6dnkv9-C5-yrpihVJFg@mail.gmail.com>

On Mon, 22 Apr 2024, Lex Siegel wrote:
> > Better still would be for kernel_connect() to return a more normal error
> > code - not EPERM.  If that cannot be achieved, then I think it would be
> > best for the sunrpc code to map EPERM to something else at the place
> > where kernel_connect() is called - catch it early.
> 
> The question is whether a permission error, EPERM, should cause a retry or
> return. Currently xs_tcp_setup_socket() is retrying. For the retry to clear,
> the connect call will have to not return a permission error to halt the retry
> attempts.
> 
> This is a default behavior because EPERM is not an explicit case of the switch
> statement. Because bpf appropriately uses EPERM to show that the kernel_connect
> was not permitted, it highlights the return handling for this case is missing.
> It is unlikely that retry was ever the intended result.
> 
> Upstream, the bpf that caused this is at:
> https://github.com/cilium/cilium/blob/v1.15/bpf/bpf_sock.c#L336
> 
> This cilium bpf code has two return statuses, EPERM and ENXIO, that fall
> through to the default case of retrying. Here, cilium expects both of these
> statuses to indicate the connect failed. A retry is not the intended result.
> 
> Handling this case without a retry aligns this code with the udp behavior. This
> precedence for passing EPERM back up the stack was set in 3dedbb5ca10ef.
> 
> I will amend my patch to include an explicit case for ENXIO as well, as this is
> also in cilium's bpf and will cause the same bug to occur.
> 

I think it should be up to cilium to report an errno that the kernel
understands, not up to the kernel to understand whatever errno cilium
chooses to return.

I don't think EPERM or ENXIO are appropriate errors for network
problems.
EHOSTUNREACH or ECONNREFUSED would make much more sense.

NeilBrown


> 
> On Mon, Apr 22, 2024 at 8:22 AM NeilBrown <neilb@suse.de> wrote:
> >
> > On Sat, 20 Apr 2024, Lex Siegel wrote:
> > > When using a bpf on kernel_connect(), the call can return -EPERM.
> > > This causes xs_tcp_setup_socket() to loop forever, filling up the
> > > syslog and causing the kernel to freeze up.
> > >
> > > Signed-off-by: Lex Siegel <usiegl00@gmail.com>
> > > ---
> > >  net/sunrpc/xprtsock.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > index bb9b747d58a1..47b254806a08 100644
> > > --- a/net/sunrpc/xprtsock.c
> > > +++ b/net/sunrpc/xprtsock.c
> > > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> > >               /* Happens, for instance, if the user specified a link
> > >                * local IPv6 address without a scope-id.
> > >                */
> > > +     case -EPERM:
> > > +             /* Happens, for instance, if a bpf is preventing the connect */
> >
> > This will propagate -EPERM up into other layers which might not be ready
> > to handle it.
> > It might be safer to map EPERM to an error we would be more likely to
> > expect  from the network system - such as ECONNREFUSED or ENETDOWN.
> >
> > Better still would be for kernel_connect() to return a more normal error
> > code - not EPERM.  If that cannot be achieved, then I think it would be
> > best for the sunrpc code to map EPERM to something else at the place
> > where kernel_connect() is called - catch it early.
> >
> > NeilBrown
> >
> >
> > >       case -ECONNREFUSED:
> > >       case -ECONNRESET:
> > >       case -ENETDOWN:
> > > --
> > > 2.39.3
> > >
> > >
> >
> 


      reply	other threads:[~2024-04-22  3:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-20 10:48 [PATCH] xprtsock: Fix a loop in xs_tcp_setup_socket() Lex Siegel
2024-04-20 11:06 ` Jeff Layton
2024-04-21 22:32 ` Trond Myklebust
2024-04-21 23:22 ` NeilBrown
2024-04-22  3:32   ` Lex Siegel
2024-04-22  3:44     ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=171375748540.7600.5672163982570379489@noble.neil.brown.name \
    --to=neilb@suse.de \
    --cc=Dai.Ngo@oracle.com \
    --cc=anna@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jlayton@kernel.org \
    --cc=kolga@netapp.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=tom@talpey.com \
    --cc=trond.myklebust@hammerspace.com \
    --cc=usiegl00@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).