Re: [PATCH v6 3/3] libceph: just wait for more data to be available on the socket

CEPH-Devel archive mirror
 help / color / mirror / Atom feed

From: Xiubo Li <xiubli@redhat.com>
To: Luis Henriques <lhenriques@suse.de>
Cc: ceph-devel@vger.kernel.org, idryomov@gmail.com,
	jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com
Subject: Re: [PATCH v6 3/3] libceph: just wait for more data to be available on the socket
Date: Wed, 28 Feb 2024 08:45:41 +0800	[thread overview]
Message-ID: <831bfb4a-6213-4e32-8c68-252be354342e@redhat.com> (raw)
In-Reply-To: <871q8x4yac.fsf@suse.de>


On 2/28/24 01:22, Luis Henriques wrote:
> Hi Xiubo!
>
> xiubli@redhat.com writes:
>
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> A short read may occur while reading the message footer from the
>> socket.  Later, when the socket is ready for another read, the
>> messenger shoudl invoke all read_partial* handlers, except the
>> read_partial_sparse_msg_data().  The contract between the messenger
>> and these handlers is that the handlers should bail if the area
>> of the message is responsible for is already processed.  So,
>> in this case, it's expected that read_sparse_msg_data() would bail,
>> allowing the messenger to invoke read_partial() for the footer and
>> pick up where it left off.
>>
>> However read_partial_sparse_msg_data() violates that contract and
>> ends up calling into the state machine in the OSD client. The
>> sparse-read state machine just assumes that it's a new op and
>> interprets some piece of the footer as the sparse-read extents/data
>> and then returns bogus extents/data length, etc.
>>
>> This will just reuse the 'total_resid' to determine whether should
>> the read_partial_sparse_msg_data() bail out or not. Because once
>> it reaches to zero that means all the extents and data have been
>> successfully received in last read, else it could break out when
>> partially reading any of the extents and data. And then the
>> osd_sparse_read() could continue where it left off.
> I'm seeing an issue with fstest generic/580, which seems to enter an
> infinite loop effectively rendering the testing VM unusable.  It's pretty
> easy to reproduce, just run the test ensuring to be using msgv2 (I'm
> mounting the filesystem with 'ms_mode=crc'), and you should see the
> following on the logs:
>
> [...]
>    libceph: prepare_sparse_read_cont: ret 0x1000 total_resid 0x0 resid 0x0
>    libceph: osd1 (2)192.168.155.1:6810 read processing error
>    libceph: mon0 (2)192.168.155.1:40608 session established
>    libceph: bad late_status 0x1
>    libceph: osd1 (2)192.168.155.1:6810 protocol error, bad epilogue
>    libceph: mon0 (2)192.168.155.1:40608 session established
>    libceph: prepare_sparse_read_cont: ret 0x1000 total_resid 0x0 resid 0x0
>    libceph: osd1 (2)192.168.155.1:6810 read processing error
>    libceph: mon0 (2)192.168.155.1:40608 session established
>    libceph: bad late_status 0x1
> [...]
>
> Reverting this patch (commit 8e46a2d068c9 ("libceph: just wait for more
> data to be available on the socket")) seems to fix.  I haven't
> investigated it further, but since it'll take me some time to refresh my
> memory, I thought I should report it immediately.  Maybe someone has any
> idea.

The following patch should fix it. I haven't test it yet. Will do it 
later today:


diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index a0ca5414b333..2c32ad4d9774 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -1860,10 +1860,10 @@ static int prepare_read_control_remainder(struct 
ceph_connection *con)
  static int prepare_read_data(struct ceph_connection *con)
  {
         struct bio_vec bv;
+       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);

         con->in_data_crc = -1;
-       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg,
-                                 data_len(con->in_msg));
+       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);

         get_bvec_at(&con->v2.in_cursor, &bv);
         if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) {


> Cheers,

     prev parent reply	other threads:[~2024-02-28  0:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25  2:39 [PATCH v6 0/3] libceph: fix sparse-read failure bug xiubli
2024-01-25  2:39 ` [PATCH v6 1/3] libceph: fail the sparse-read if the data length doesn't match xiubli
2024-01-25  2:39 ` [PATCH v6 2/3] libceph: rename read_sparse_msg_XX to read_partial_sparse_msg_XX xiubli
2024-01-25  2:39 ` [PATCH v6 3/3] libceph: just wait for more data to be available on the socket xiubli
2024-02-27 17:22   ` Luis Henriques
2024-02-28  0:22     ` Xiubo Li
2024-02-28  0:45     ` Xiubo Li [this message]

find likely ancestor, descendant, or conflicting patches for this message:
dfblob:a0ca5414b33 dfblob:2c32ad4d977
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=831bfb4a-6213-4e32-8c68-252be354342e@redhat.com \
    --to=xiubli@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=lhenriques@suse.de \
    --cc=mchangir@redhat.com \
    --cc=vshankar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).