All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: "David Ahern" <dsahern@kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	netdev@vger.kernel.org, "Neal Cardwell" <ncardwell@google.com>,
	"Dragos Tatulea" <dtatulea@nvidia.com>,
	eric.dumazet@gmail.com, "Maciej Żenczykowski" <maze@google.com>,
	"Willem de Bruijn" <willemb@google.com>,
	"Shachar Kagan" <skagan@nvidia.com>
Subject: Re: [PATCH net-next 1/2] tcp: conditionally call ip_icmp_error() from tcp_v4_err()
Date: Thu, 18 Apr 2024 12:22:16 +0200	[thread overview]
Message-ID: <CANn89i+BKDL-BHqHyev9PAzbHqp8xhkC=4kZTB7vydcBVkc0Nw@mail.gmail.com> (raw)
In-Reply-To: <CANn89iJg7AcxMLbvwnghN85L6ASuoKsSSSHdgaQzBU48G1TRiw@mail.gmail.com>

On Thu, Apr 18, 2024 at 12:15 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Apr 18, 2024 at 11:58 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On Thu, 2024-04-18 at 11:26 +0200, Eric Dumazet wrote:
> > > On Thu, Apr 18, 2024 at 10:03 AM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Thu, Apr 18, 2024 at 10:02 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Wed, 2024-04-17 at 16:57 +0000, Eric Dumazet wrote:
> > > > > > Blamed commit claimed in its changelog that the new functionality
> > > > > > was guarded by IP_RECVERR/IPV6_RECVERR :
> > > > > >
> > > > > >     Note that applications need to set IP_RECVERR/IPV6_RECVERR option to
> > > > > >     enable this feature, and that the error message is only queued
> > > > > >     while in SYN_SNT state.
> > > > > >
> > > > > > This was true only for IPv6, because ipv6_icmp_error() has
> > > > > > the following check:
> > > > > >
> > > > > > if (!inet6_test_bit(RECVERR6, sk))
> > > > > >     return;
> > > > > >
> > > > > > Other callers check IP_RECVERR by themselves, it is unclear
> > > > > > if we could factorize these checks in ip_icmp_error()
> > > > > >
> > > > > > For stable backports, I chose to add the missing check in tcp_v4_err()
> > > > > >
> > > > > > We think this missing check was the root cause for commit
> > > > > > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving
> > > > > > some ICMP") breakage, leading to a revert.
> > > > > >
> > > > > > Many thanks to Dragos Tatulea for conducting the investigations.
> > > > > >
> > > > > > As Jakub said :
> > > > > >
> > > > > >     The suspicion is that SSH sees the ICMP report on the socket error queue
> > > > > >     and tries to connect() again, but due to the patch the socket isn't
> > > > > >     disconnected, so it gets EALREADY, and throws its hands up...
> > > > > >
> > > > > >     The error bubbles up to Vagrant which also becomes unhappy.
> > > > > >
> > > > > >     Can we skip the call to ip_icmp_error() for non-fatal ICMP errors?
> > > > > >
> > > > > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users")
> > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > > Tested-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > Cc: Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > Cc: Maciej Żenczykowski <maze@google.com>
> > > > > > Cc: Willem de Bruijn <willemb@google.com>
> > > > > > Cc: Neal Cardwell <ncardwell@google.com>
> > > > > > Cc: Shachar Kagan <skagan@nvidia.com>
> > > > > > ---
> > > > > >  net/ipv4/tcp_ipv4.c | 3 ++-
> > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > > > > > index 88c83ac4212957f19efad0f967952d2502bdbc7f..a717db99972d977a64178d7ed1109325d64a6d51 100644
> > > > > > --- a/net/ipv4/tcp_ipv4.c
> > > > > > +++ b/net/ipv4/tcp_ipv4.c
> > > > > > @@ -602,7 +602,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
> > > > > >               if (fastopen && !fastopen->sk)
> > > > > >                       break;
> > > > > >
> > > > > > -             ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th);
> > > > > > +             if (inet_test_bit(RECVERR, sk))
> > > > > > +                     ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th);
> > > > > >
> > > > > >               if (!sock_owned_by_user(sk)) {
> > > > > >                       WRITE_ONCE(sk->sk_err, err);
> > > > >
> > > > > We have a fcnal-test.sh self-test failure:
> > > > >
> > > > > https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-04-18--06-00&test=fcnal-test-sh
> > > > >
> > > > > that I suspect are related to this patch (or the following one): the
> > > > > test case creates a TCP connection on loopback and this is the only
> > > > > patchseries touching the related code, included in the relevant patch
> > > > > burst.
> > > > >
> > > > > Could you please have a look?
> > > >
> > > > Sure, thanks Paolo !
> > >
> > > First patch is fine, I see no failure from fcnal-test.sh (as I would expect)
> > >
> > > For the second one, I am not familiar enough with this very slow test
> > > suite (all these "sleep 1" ... oh well)
> >
> > @David, some of them could be replaced with loopy_wait calls
> >
> > > I guess "failing tests" depended on TCP connect() to immediately abort
> > > on one ICMP message,
> > > depending on old kernel behavior.
> > >
> > > I do not know how to launch a subset of the tests, and trace these.
> > >
> > > "./fcnal-test.sh -t ipv4_tcp" alone takes more than 9 minutes [1] in a
> > > VM running a non debug kernel :/
> > >
> > > David, do you have an idea how to proceed ?
> >
> > One very dumb thing I do in that cases is commenting out the other
> > tests, something alike (completely untested!):
> > ---
> > diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh
> > index 386ebd829df5..494932aa99b2 100755
> > --- a/tools/testing/selftests/net/fcnal-test.sh
> > +++ b/tools/testing/selftests/net/fcnal-test.sh
> > @@ -1186,6 +1186,7 @@ ipv4_tcp_novrf()
> >  {
> >         local a
> >
> > +if false; then
> >         #
> >         # server tests
> >         #
> > @@ -1271,6 +1272,7 @@ ipv4_tcp_novrf()
> >                 log_test_addr ${a} $? 1 "Device server, unbound client, local connection"
> >         done
> >
> > +fi
> >         a=${NSA_IP}
> >         log_start
> >         run_cmd nettest -s &
> > @@ -1487,12 +1489,14 @@ ipv4_tcp()
> >         set_sysctl net.ipv4.tcp_l3mdev_accept=0
> >         ipv4_tcp_novrf
> >         log_subsection "tcp_l3mdev_accept enabled"
> > +if false; then
> >         set_sysctl net.ipv4.tcp_l3mdev_accept=1
> >         ipv4_tcp_novrf
> >
> >         log_subsection "With VRF"
> >         setup "yes"
> >         ipv4_tcp_vrf
> > +fi
> >  }
>
> Thanks Paolo
>
> I found that the following patch is fixing the issue for me.
>
> diff --git a/tools/testing/selftests/net/nettest.c
> b/tools/testing/selftests/net/nettest.c
> index cd8a580974480212b45d86f35293b77f3d033473..ff25e53024ef6d4101f251c8a8a5e936e44e280f
> 100644
> --- a/tools/testing/selftests/net/nettest.c
> +++ b/tools/testing/selftests/net/nettest.c
> @@ -1744,6 +1744,7 @@ static int connectsock(void *addr, socklen_t
> alen, struct sock_args *args)
>         if (args->bind_test_only)
>                 goto out;
>
> +       set_recv_attr(sd, args->version);
>         if (connect(sd, addr, alen) < 0) {
>                 if (errno != EINPROGRESS) {
>                         log_err_errno("Failed to connect to remote host");

When tracing nettest we now have EHOSTUNREACH

3343  setsockopt(3, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0 <0.000210>
3343  setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "eth1\0", 5) = 0 <0.000170>
3343  setsockopt(3, SOL_IP, IP_PKTINFO, [1], 4) = 0 <0.000161>
3343  setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0 <0.000181>
3343  connect(3, {sa_family=AF_INET, sin_port=htons(12345),
sin_addr=inet_addr("172.16.2.1")}, 16) = -1 EINPROGRESS (Operation now
in progress) <0.000874>
3343  pselect6(1024, NULL, [3], NULL, {tv_sec=5, tv_nsec=0}, NULL) = 1
(out [3], left {tv_sec=1, tv_nsec=930762080}) <3.069673>
3343  getsockopt(3, SOL_SOCKET, SO_ERROR, [EHOSTUNREACH], [4]) = 0 <0.000270>

As mentioned in net/ipv4/icmp.c :
 RFC 1122: 3.2.2.1 States that NET_UNREACH, HOST_UNREACH and SR_FAILED
MUST be considered 'transient errs'.

Maybe another way to fix nettest would be to change wait_for_connect()
to pass a non NULL fdset in 4th argument of select()

select(FD_SETSIZE, NULL, &wfd, NULL /* here */, tv);

  reply	other threads:[~2024-04-18 10:22 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17 16:57 [PATCH net-next 0/2] tcp: tcp_v4_err() changes Eric Dumazet
2024-04-17 16:57 ` [PATCH net-next 1/2] tcp: conditionally call ip_icmp_error() from tcp_v4_err() Eric Dumazet
2024-04-17 17:08   ` Maciej Żenczykowski
2024-04-18  3:22   ` Jason Xing
2024-04-18  6:45     ` Eric Dumazet
2024-04-18  6:53       ` Jason Xing
2024-04-18  8:02   ` Paolo Abeni
2024-04-18  8:03     ` Eric Dumazet
2024-04-18  9:26       ` Eric Dumazet
2024-04-18  9:58         ` Paolo Abeni
2024-04-18 10:15           ` Eric Dumazet
2024-04-18 10:22             ` Eric Dumazet [this message]
2024-04-18 10:36               ` Eric Dumazet
2024-04-18 17:46             ` David Ahern
2024-04-18 17:47               ` Eric Dumazet
2024-04-18 18:02                 ` David Ahern
2024-04-18 18:09                   ` Eric Dumazet
2024-04-18 18:09                 ` Jakub Kicinski
2024-04-18 18:14                   ` Eric Dumazet
2024-04-18 20:20                   ` David Ahern
2024-04-18 17:56         ` David Ahern
2024-04-17 16:57 ` [PATCH net-next 2/2] tcp: no longer abort SYN_SENT when receiving some ICMP (II) Eric Dumazet
2024-04-18  3:24   ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANn89i+BKDL-BHqHyev9PAzbHqp8xhkC=4kZTB7vydcBVkc0Nw@mail.gmail.com' \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=dtatulea@nvidia.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=maze@google.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=skagan@nvidia.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.