I'm seeing a problem where a client process does a TCP connect(2) at the
same time that a SIGKILL is sent to the process doing listen(2).
Occasionally, seemingly depending on some exact timing/race, the connect()
will succeed, but the client is never notified that the server closed the
connection (no FIN/RST packet), and a poll()/select() in the client waits
indefinitely.

The behaviour is as if the code that handles process kill first walks the
list of existing connections (inclusive listen(2) backlog) and sends FIN to
the client. Then it shuts down the listening socket, after which SYN will be
replied with RST. But leaving a small time window in-between where new
connections are still acknowledged with SYN/ACK, but no longer shut down by
FIN nor RST.

This is all on 127.0.0.1 loopback, so no network/packet loss issues involved.

$ uname -a
Linux urd 5.10.0-8-amd64 #1 SMP Debian 5.10.46-5 (2021-09-23) x86_64 GNU/Linux

Also reproduced on several other kernel versions and on RiscV.

Attached is a perl script that reproduces the problem, also available here:

  https://knielsen-hq.org/test_listen_backlog_on_server_kill.pl

The script repeatedly forks a server process, establishes some connections,
does kill -9 of the server, tries to re-connect at the same time, and tests
whether the reconnect is handled correctly (either refused, or notified of
close). For me, usually the problem occurs within a few 100 iterations.

Here is an example output where it triggers the error and corresponding
tcpdump output:

-----------------------------------------------------------------------
AHA! select() on extra connection timed out on iteration 67!
Extra connection fd=19 port=59404

14:49:55.066435 lo    In  IP localhost.59404 > localhost.2345: Flags [S], seq 4284834695, win 65495, options [mss 65495,sackOK,TS val 152268719 ecr 0,nop,wscale 7], length 0
14:49:55.066465 lo    In  IP localhost.2345 > localhost.59404: Flags [S.], seq 3024130858, ack 4284834696, win 65483, options [mss 65495,sackOK,TS val 152268719 ecr 152268719,nop,wscale 7], length 0
14:49:55.066491 lo    In  IP localhost.59404 > localhost.2345: Flags [.], ack 1, win 512, options [nop,nop,TS val 152268719 ecr 152268719], length 0
14:50:05.077150 lo    In  IP localhost.59404 > localhost.2345: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 152278730 ecr 152268719], length 0
14:50:05.077183 lo    In  IP localhost.2345 > localhost.59404: Flags [R], seq 3024130859, win 0, length 0
-----------------------------------------------------------------------

We see the connection being established with SYN/ACK, but no FIN is sent
when the server process exits. And only 10 seconds later, when the script
times out the poll()/select() does the client send FIN, which is replied
with RST as there is no listening socket on port 2345.

Occasionally another behaviour is seen, the client's initial SYN packet is
not replied, causing client retransmission (which is then replied with RST):

-----------------------------------------------------------------------
Oops, connect() took 1.008663 seconds! (connect=No)

14:57:19.389914 lo    In  IP localhost.43856 > localhost.2345: Flags [S], seq 2851822367, win 65495, options [mss 65495,sackOK,TS val 152713043 ecr 0,nop,wscale 7], length 0
14:57:20.398363 lo    In  IP localhost.43856 > localhost.2345: Flags [S], seq 2851822367, win 65495, options [mss 65495,sackOK,TS val 152714051 ecr 0,nop,wscale 7], length 0
14:57:20.398415 lo    In  IP localhost.2345 > localhost.43856: Flags [R.], seq 0, ack 2851822368, win 0, length 0
-----------------------------------------------------------------------

A practical consequence of this bug is that if a server dies, a client may
seemingly re-establish its connection successfully and think that it is
again connected to the (restarted) server and wait for data. But in reality
the client's connection is dead, and the client can wait indefinitely for
data on the socket or EOF/close notification.

This problem originates from the testsuite / continuous integration of
MariaDB, a relational database. The testsuite is testing the correctness of
various scenarios of the server process crashing. These tests very
occasionally fail due to a timeout on the re-established server connection,
which is due to this bug. Original MariaDB bug for reference:

  https://jira.mariadb.org/browse/MDEV-30232

Any ideas? Is this a known issue?

 - Kristian.