CEPH-Devel archive mirror
 help / color / mirror / Atom feed
From: Xiubo Li <xiubli@redhat.com>
To: Ilya Dryomov <idryomov@gmail.com>, Jeff Layton <jlayton@kernel.org>
Cc: ceph-devel@vger.kernel.org, vshankar@redhat.com, mchangir@redhat.com
Subject: Re: [PATCH v4 3/3] libceph: just wait for more data to be available on the socket
Date: Tue, 23 Jan 2024 08:53:39 +0800	[thread overview]
Message-ID: <957d62aa-2ef4-4402-8cd8-d044b61c4c5e@redhat.com> (raw)
In-Reply-To: <CAOi1vP8M_85Xr20swLJzjh5y4J2ZoDe4R4ZQ602MrNtV6UcVVA@mail.gmail.com>


On 1/23/24 03:41, Ilya Dryomov wrote:
> On Mon, Jan 22, 2024 at 6:14 PM Jeff Layton <jlayton@kernel.org> wrote:
>> On Mon, 2024-01-22 at 17:55 +0100, Ilya Dryomov wrote:
>>> On Mon, Jan 22, 2024 at 4:02 PM Jeff Layton <jlayton@kernel.org> wrote:
>>>> On Thu, 2024-01-18 at 18:50 +0800, xiubli@redhat.com wrote:
>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>
>>>>> The messages from ceph maybe split into multiple socket packages
>>>>> and we just need to wait for all the data to be availiable on the
>>>>> sokcet.
>>>>>
>>>>> This will add 'sr_total_resid' to record the total length for all
>>>>> data items for sparse-read message and 'sr_resid_elen' to record
>>>>> the current extent total length.
>>>>>
>>>>> URL: https://tracker.ceph.com/issues/63586
>>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>>> ---
>>>>>   include/linux/ceph/messenger.h |  1 +
>>>>>   net/ceph/messenger_v1.c        | 32 +++++++++++++++++++++-----------
>>>>>   2 files changed, 22 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
>>>>> index 2eaaabbe98cb..ca6f82abed62 100644
>>>>> --- a/include/linux/ceph/messenger.h
>>>>> +++ b/include/linux/ceph/messenger.h
>>>>> @@ -231,6 +231,7 @@ struct ceph_msg_data {
>>>>>
>>>>>   struct ceph_msg_data_cursor {
>>>>>        size_t                  total_resid;    /* across all data items */
>>>>> +     size_t                  sr_total_resid; /* across all data items for sparse-read */
>>>>>
>>>>>        struct ceph_msg_data    *data;          /* current data item */
>>>>>        size_t                  resid;          /* bytes not yet consumed */
>>>>> diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
>>>>> index 4cb60bacf5f5..2733da891688 100644
>>>>> --- a/net/ceph/messenger_v1.c
>>>>> +++ b/net/ceph/messenger_v1.c
>>>>> @@ -160,7 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con)
>>>>>   static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
>>>>>   {
>>>>>        /* Initialize data cursor if it's not a sparse read */
>>>>> -     if (!msg->sparse_read)
>>>>> +     if (msg->sparse_read)
>>>>> +             msg->cursor.sr_total_resid = data_len;
>>>>> +     else
>>>>>                ceph_msg_data_cursor_init(&msg->cursor, msg, data_len);
>>>>>   }
>>>>>
>>>>>
>>>> Sorry, disregard my last email.
>>>>
>>>> I went and looked at the patch in the tree, and I better understand what
>>>> you're trying to do. We already have a value that gets set to data_len
>>>> called total_resid, but that is a nonsense value in a sparse read,
>>>> because we can advance farther through the receive buffer than the
>>>> amount of data in the socket.
>>> Hi Jeff,
>>>
>>> I see that total_resid is set to sparse_data_requested(), which is
>>> effectively the size of the receive buffer, not data_len.  (This is
>>> ignoring the seemingly unnecessary complication of trying to support
>>> normal reads mixed with sparse reads in the same message, which I'm
>>> pretty sure doesn't work anyway.)
>>>
>> Oh right. I missed that bit when I was re-reviewing this. Once we're in
>> a sparse read we'll override that value. Ok, so maybe we don't need
>> sr_total_resid.
>>
Oh, I get you now. Yeah we can just reuse the 'total_resid' instead of 
adding a new one.

>>> With that, total_resid should represent the amount that needs to be
>>> filled in (advanced through) in the receive buffer.  When total_resid
>>> reaches 0, wouldn't that mean that the amount of data in the socket is
>>> also 0?  If not, where would the remaining data in the socket go?
>>>
>> With a properly formed reply, then I think yes, there should be no
>> remaining data in the socket at the end of the receive.
> There would be no actual data, but a message footer which follows the
> data section and ends the message would be outstanding.

Yeah, correct.

>> At this point I think I must just be confused about the actual problem.
>> I think I need a detailed description of it before I can properly review
>> this patch.
> AFAIU the problem is that a short read may occur while reading the
> message footer from the socket.  Later, when the socket is ready for
> another read, the messenger invokes all read_partial* handlers,
> including read_partial_sparse_msg_data().  The contract between the
> messenger and these handlers is that the handler should bail if the
> area of the message it's responsible for is already processed.  So,
> in this case, it's expected that read_sparse_msg_data() would bail,
> allowing the messenger to invoke read_partial() for the footer and
> pick up where it left off.
>
> However read_sparse_msg_data() violates that contract and ends up
> calling into the state machine in the OSD client.  The state machine
> assumes that it's a new op and interprets some piece of the footer (or
> even completely random memory?) as the sparse-read header and returns
> bogus extent length, etc.
>
> (BTW it's why I suggested the rename from read_sparse_msg_data() to
> read_partial_sparse_msg_data() in another patch -- to highlight that
> it's one of the "partial" handlers and the respective behavior.)

Yeah, Ilya is correct.

Thanks

- Xiubo

> Thanks,
>
>                  Ilya
>


      reply	other threads:[~2024-01-23  0:53 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-18 10:50 [PATCH v4 0/3] libceph: fix sparse-read failure bug xiubli
2024-01-18 10:50 ` [PATCH v4 1/3] libceph: fail the sparse-read if there still has data in socket xiubli
2024-01-18 14:03   ` Jeff Layton
2024-01-19  4:07     ` Xiubo Li
2024-01-19 11:03       ` Jeff Layton
2024-01-22  3:17         ` Xiubo Li
2024-01-18 10:50 ` [PATCH v4 2/3] libceph: rename read_sparse_msg_XX to read_partial_sparse_msg_XX xiubli
2024-01-18 14:04   ` Jeff Layton
2024-01-18 10:50 ` [PATCH v4 3/3] libceph: just wait for more data to be available on the socket xiubli
2024-01-18 14:36   ` Jeff Layton
2024-01-18 18:24   ` Jeff Layton
2024-01-19  4:35     ` Xiubo Li
2024-01-19 11:09       ` Jeff Layton
2024-01-22  2:52         ` Xiubo Li
2024-01-22 11:44           ` Jeff Layton
2024-01-22 15:02   ` Jeff Layton
2024-01-22 16:55     ` Ilya Dryomov
2024-01-22 17:14       ` Jeff Layton
2024-01-22 19:41         ` Ilya Dryomov
2024-01-23  0:53           ` Xiubo Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=957d62aa-2ef4-4402-8cd8-d044b61c4c5e@redhat.com \
    --to=xiubli@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=mchangir@redhat.com \
    --cc=vshankar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).