From: Xiubo Li <xiubli@redhat.com>
To: Luis Henriques <lhenriques@suse.de>
Cc: ceph-devel@vger.kernel.org, idryomov@gmail.com,
jlayton@kernel.org, vshankar@redhat.com, mchangir@redhat.com
Subject: Re: [PATCH v6 3/3] libceph: just wait for more data to be available on the socket
Date: Wed, 28 Feb 2024 08:45:41 +0800 [thread overview]
Message-ID: <831bfb4a-6213-4e32-8c68-252be354342e@redhat.com> (raw)
In-Reply-To: <871q8x4yac.fsf@suse.de>
On 2/28/24 01:22, Luis Henriques wrote:
> Hi Xiubo!
>
> xiubli@redhat.com writes:
>
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> A short read may occur while reading the message footer from the
>> socket. Later, when the socket is ready for another read, the
>> messenger shoudl invoke all read_partial* handlers, except the
>> read_partial_sparse_msg_data(). The contract between the messenger
>> and these handlers is that the handlers should bail if the area
>> of the message is responsible for is already processed. So,
>> in this case, it's expected that read_sparse_msg_data() would bail,
>> allowing the messenger to invoke read_partial() for the footer and
>> pick up where it left off.
>>
>> However read_partial_sparse_msg_data() violates that contract and
>> ends up calling into the state machine in the OSD client. The
>> sparse-read state machine just assumes that it's a new op and
>> interprets some piece of the footer as the sparse-read extents/data
>> and then returns bogus extents/data length, etc.
>>
>> This will just reuse the 'total_resid' to determine whether should
>> the read_partial_sparse_msg_data() bail out or not. Because once
>> it reaches to zero that means all the extents and data have been
>> successfully received in last read, else it could break out when
>> partially reading any of the extents and data. And then the
>> osd_sparse_read() could continue where it left off.
> I'm seeing an issue with fstest generic/580, which seems to enter an
> infinite loop effectively rendering the testing VM unusable. It's pretty
> easy to reproduce, just run the test ensuring to be using msgv2 (I'm
> mounting the filesystem with 'ms_mode=crc'), and you should see the
> following on the logs:
>
> [...]
> libceph: prepare_sparse_read_cont: ret 0x1000 total_resid 0x0 resid 0x0
> libceph: osd1 (2)192.168.155.1:6810 read processing error
> libceph: mon0 (2)192.168.155.1:40608 session established
> libceph: bad late_status 0x1
> libceph: osd1 (2)192.168.155.1:6810 protocol error, bad epilogue
> libceph: mon0 (2)192.168.155.1:40608 session established
> libceph: prepare_sparse_read_cont: ret 0x1000 total_resid 0x0 resid 0x0
> libceph: osd1 (2)192.168.155.1:6810 read processing error
> libceph: mon0 (2)192.168.155.1:40608 session established
> libceph: bad late_status 0x1
> [...]
>
> Reverting this patch (commit 8e46a2d068c9 ("libceph: just wait for more
> data to be available on the socket")) seems to fix. I haven't
> investigated it further, but since it'll take me some time to refresh my
> memory, I thought I should report it immediately. Maybe someone has any
> idea.
The following patch should fix it. I haven't test it yet. Will do it
later today:
diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index a0ca5414b333..2c32ad4d9774 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -1860,10 +1860,10 @@ static int prepare_read_control_remainder(struct
ceph_connection *con)
static int prepare_read_data(struct ceph_connection *con)
{
struct bio_vec bv;
+ u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
con->in_data_crc = -1;
- ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg,
- data_len(con->in_msg));
+ ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);
get_bvec_at(&con->v2.in_cursor, &bv);
if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) {
> Cheers,
prev parent reply other threads:[~2024-02-28 0:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-25 2:39 [PATCH v6 0/3] libceph: fix sparse-read failure bug xiubli
2024-01-25 2:39 ` [PATCH v6 1/3] libceph: fail the sparse-read if the data length doesn't match xiubli
2024-01-25 2:39 ` [PATCH v6 2/3] libceph: rename read_sparse_msg_XX to read_partial_sparse_msg_XX xiubli
2024-01-25 2:39 ` [PATCH v6 3/3] libceph: just wait for more data to be available on the socket xiubli
2024-02-27 17:22 ` Luis Henriques
2024-02-28 0:22 ` Xiubo Li
2024-02-28 0:45 ` Xiubo Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=831bfb4a-6213-4e32-8c68-252be354342e@redhat.com \
--to=xiubli@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=lhenriques@suse.de \
--cc=mchangir@redhat.com \
--cc=vshankar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).