From: Xiubo Li <xiubli@redhat.com>
To: Ilya Dryomov <idryomov@gmail.com>, Jeff Layton <jlayton@kernel.org>
Cc: ceph-devel@vger.kernel.org, vshankar@redhat.com, mchangir@redhat.com
Subject: Re: [PATCH v4 3/3] libceph: just wait for more data to be available on the socket
Date: Tue, 23 Jan 2024 08:53:39 +0800 [thread overview]
Message-ID: <957d62aa-2ef4-4402-8cd8-d044b61c4c5e@redhat.com> (raw)
In-Reply-To: <CAOi1vP8M_85Xr20swLJzjh5y4J2ZoDe4R4ZQ602MrNtV6UcVVA@mail.gmail.com>
On 1/23/24 03:41, Ilya Dryomov wrote:
> On Mon, Jan 22, 2024 at 6:14 PM Jeff Layton <jlayton@kernel.org> wrote:
>> On Mon, 2024-01-22 at 17:55 +0100, Ilya Dryomov wrote:
>>> On Mon, Jan 22, 2024 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote:
>>>> On Thu, 2024-01-18 at 18:50 +0800, xiubli@redhat.com wrote:
>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>
>>>>> The messages from ceph maybe split into multiple socket packages
>>>>> and we just need to wait for all the data to be availiable on the
>>>>> sokcet.
>>>>>
>>>>> This will add 'sr_total_resid' to record the total length for all
>>>>> data items for sparse-read message and 'sr_resid_elen' to record
>>>>> the current extent total length.
>>>>>
>>>>> URL: https://tracker.ceph.com/issues/63586
>>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>>> ---
>>>>> include/linux/ceph/messenger.h | 1 +
>>>>> net/ceph/messenger_v1.c | 32 +++++++++++++++++++++-----------
>>>>> 2 files changed, 22 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
>>>>> index 2eaaabbe98cb..ca6f82abed62 100644
>>>>> --- a/include/linux/ceph/messenger.h
>>>>> +++ b/include/linux/ceph/messenger.h
>>>>> @@ -231,6 +231,7 @@ struct ceph_msg_data {
>>>>>
>>>>> struct ceph_msg_data_cursor {
>>>>> size_t total_resid; /* across all data items */
>>>>> + size_t sr_total_resid; /* across all data items for sparse-read */
>>>>>
>>>>> struct ceph_msg_data *data; /* current data item */
>>>>> size_t resid; /* bytes not yet consumed */
>>>>> diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
>>>>> index 4cb60bacf5f5..2733da891688 100644
>>>>> --- a/net/ceph/messenger_v1.c
>>>>> +++ b/net/ceph/messenger_v1.c
>>>>> @@ -160,7 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con)
>>>>> static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
>>>>> {
>>>>> /* Initialize data cursor if it's not a sparse read */
>>>>> - if (!msg->sparse_read)
>>>>> + if (msg->sparse_read)
>>>>> + msg->cursor.sr_total_resid = data_len;
>>>>> + else
>>>>> ceph_msg_data_cursor_init(&msg->cursor, msg, data_len);
>>>>> }
>>>>>
>>>>>
>>>> Sorry, disregard my last email.
>>>>
>>>> I went and looked at the patch in the tree, and I better understand what
>>>> you're trying to do. We already have a value that gets set to data_len
>>>> called total_resid, but that is a nonsense value in a sparse read,
>>>> because we can advance farther through the receive buffer than the
>>>> amount of data in the socket.
>>> Hi Jeff,
>>>
>>> I see that total_resid is set to sparse_data_requested(), which is
>>> effectively the size of the receive buffer, not data_len. (This is
>>> ignoring the seemingly unnecessary complication of trying to support
>>> normal reads mixed with sparse reads in the same message, which I'm
>>> pretty sure doesn't work anyway.)
>>>
>> Oh right. I missed that bit when I was re-reviewing this. Once we're in
>> a sparse read we'll override that value. Ok, so maybe we don't need
>> sr_total_resid.
>>
Oh, I get you now. Yeah we can just reuse the 'total_resid' instead of
adding a new one.
>>> With that, total_resid should represent the amount that needs to be
>>> filled in (advanced through) in the receive buffer. When total_resid
>>> reaches 0, wouldn't that mean that the amount of data in the socket is
>>> also 0? If not, where would the remaining data in the socket go?
>>>
>> With a properly formed reply, then I think yes, there should be no
>> remaining data in the socket at the end of the receive.
> There would be no actual data, but a message footer which follows the
> data section and ends the message would be outstanding.
Yeah, correct.
>> At this point I think I must just be confused about the actual problem.
>> I think I need a detailed description of it before I can properly review
>> this patch.
> AFAIU the problem is that a short read may occur while reading the
> message footer from the socket. Later, when the socket is ready for
> another read, the messenger invokes all read_partial* handlers,
> including read_partial_sparse_msg_data(). The contract between the
> messenger and these handlers is that the handler should bail if the
> area of the message it's responsible for is already processed. So,
> in this case, it's expected that read_sparse_msg_data() would bail,
> allowing the messenger to invoke read_partial() for the footer and
> pick up where it left off.
>
> However read_sparse_msg_data() violates that contract and ends up
> calling into the state machine in the OSD client. The state machine
> assumes that it's a new op and interprets some piece of the footer (or
> even completely random memory?) as the sparse-read header and returns
> bogus extent length, etc.
>
> (BTW it's why I suggested the rename from read_sparse_msg_data() to
> read_partial_sparse_msg_data() in another patch -- to highlight that
> it's one of the "partial" handlers and the respective behavior.)
Yeah, Ilya is correct.
Thanks
- Xiubo
> Thanks,
>
> Ilya
>
prev parent reply other threads:[~2024-01-23 0:53 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-18 10:50 [PATCH v4 0/3] libceph: fix sparse-read failure bug xiubli
2024-01-18 10:50 ` [PATCH v4 1/3] libceph: fail the sparse-read if there still has data in socket xiubli
2024-01-18 14:03 ` Jeff Layton
2024-01-19 4:07 ` Xiubo Li
2024-01-19 11:03 ` Jeff Layton
2024-01-22 3:17 ` Xiubo Li
2024-01-18 10:50 ` [PATCH v4 2/3] libceph: rename read_sparse_msg_XX to read_partial_sparse_msg_XX xiubli
2024-01-18 14:04 ` Jeff Layton
2024-01-18 10:50 ` [PATCH v4 3/3] libceph: just wait for more data to be available on the socket xiubli
2024-01-18 14:36 ` Jeff Layton
2024-01-18 18:24 ` Jeff Layton
2024-01-19 4:35 ` Xiubo Li
2024-01-19 11:09 ` Jeff Layton
2024-01-22 2:52 ` Xiubo Li
2024-01-22 11:44 ` Jeff Layton
2024-01-22 15:02 ` Jeff Layton
2024-01-22 16:55 ` Ilya Dryomov
2024-01-22 17:14 ` Jeff Layton
2024-01-22 19:41 ` Ilya Dryomov
2024-01-23 0:53 ` Xiubo Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=957d62aa-2ef4-4402-8cd8-d044b61c4c5e@redhat.com \
--to=xiubli@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=mchangir@redhat.com \
--cc=vshankar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).