Linux-NFS Archive mirror
 help / color / mirror / Atom feed
From: Dai Ngo <dai.ngo@oracle.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: jlayton@kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2 1/3] NFSD: mark cl_cb_state as NFSD4_CB_DOWN if cl_cb_client is NULL
Date: Tue, 23 Apr 2024 13:13:47 -0700	[thread overview]
Message-ID: <89688b35-f6dc-4103-8a13-ae4fe2865e19@oracle.com> (raw)
In-Reply-To: <Zif5B3fT5JA8GvEt@tissot.1015granger.net>


On 4/23/24 11:08 AM, Chuck Lever wrote:
> On Tue, Apr 23, 2024 at 10:49:25AM -0700, Dai Ngo wrote:
>> On 4/23/24 6:41 AM, Chuck Lever wrote:
>>> On Mon, Apr 22, 2024 at 08:12:31PM -0700, Dai Ngo wrote:
>>>> In nfsd4_run_cb_work if the rpc_clnt for the back channel is no longer
>>>> exists, the callback state in nfs4_client should be marked as NFSD4_CB_DOWN
>>>> so the server can notify the client to establish a new back channel
>>>> connection.
>>>>
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> ---
>>>>    fs/nfsd/nfs4callback.c | 9 +++++++--
>>>>    1 file changed, 7 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>> index cf87ace7a1b0..f8bb5ff2e9ac 100644
>>>> --- a/fs/nfsd/nfs4callback.c
>>>> +++ b/fs/nfsd/nfs4callback.c
>>>> @@ -1491,9 +1491,14 @@ nfsd4_run_cb_work(struct work_struct *work)
>>>>    	clnt = clp->cl_cb_client;
>>>>    	if (!clnt) {
>>>> -		if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
>>>> +		if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) {
>>>>    			nfsd41_destroy_cb(cb);
>>>> -		else {
>>>> +			clear_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags);
>>>> +
>>>> +			/* let client knows BC is down when it reconnects */
>>>> +			clear_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags);
>>>> +			nfsd4_mark_cb_down(clp);
>>>> +		} else {
>>>>    			/*
>>>>    			 * XXX: Ideally, we could wait for the client to
>>>>    			 *	reconnect, but I haven't figured out how
>>> NFSD4_CLIENT_CB_KILL is for when the lease is getting expunged. It's
>>> not supposed to be used when only the transport is closed.
>> The reason NFSD4_CLIENT_CB_KILL needs to be set when the transport is
>> closed is because of commit c1ccfcf1a9bf3.
>>
>> When the transport is closed, nfsd4_conn_lost is called which then calls
>> nfsd4_probe_callback to set NFSD4_CLIENT_CB_UPDATE and schedule cl_cb_null
>> work to activate the callback worker (nfsd4_run_cb_work) to do the update.
>>
>> Callback worker calls nfsd4_process_cb_update to do rpc_shutdown_client
>> then clear cl_cb_client.
>>
>> When nfsd4_process_cb_update returns to nfsd4_run_cb_work, if cl_cb_client
>> is NULL and NFSD4_CLIENT_CB_KILL not set then it re-queues the callback,
>> causing an infinite loop.
> That's the way it is supposed to work today. The callback is
> re-queued until the client reconnects, at which point the loop is
> broken.

As you mentioned below, this needs to be reworked.

What if the client never comes back, decommissioned or student hibernates
the laptop and opens it up few days later. Even when the client comes back,
it might have been rebooted so the callback does not mean anything to it.

>
>
>>> Thus, shouldn't you mark_cb_down in this arm, instead?
>> I'm not clear what you mean here, the callback worker calls
>> nfsd4_mark_cb_down after destroying the callback.
> No, I mean in the re-queue case.

In the case of re-queue, the back channel is already marked as NFSD4_CB_DOWN
and cl_flags is NFSD4_CLIENT_STABLE|NFSD4_CLIENT_RECLAIM_COMPLETE|NFSD4_CLIENT_CONFIRMED:

Apr 23 08:07:23 nfsvmc14 kernel: nfsd4_run_cb_work: NULL cl_cb_client REQUEUE CB cb[ffff888126e8a728] clp[ffff888126e8a430] cl_cb_state[2] cl_flags[0x1c]

but that does not stop the loop.

>
>>> Even so, isn't the
>>> backchannel already marked down when we get here?
>> No, according to my testing. Without marking the back channel down the
>> client does not re-establish the back channel when it reconnects.
> I didn't expect that closing the transport on the server side would
> need any changes in fs/nfsd/nfs4callback.c. Let me get the
> backchannel retransmit behavior sorted first. I'm still working on
> setting up a test rig here.

Thanks, I will wait until you sort this out.

-Dai

>
>

  reply	other threads:[~2024-04-23 20:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-23  3:12 PATCH [v2 0/3] NFSD: drop TCP connections when NFSv4 client enters courtesy state Dai Ngo
2024-04-23  3:12 ` [PATCH v2 1/3] NFSD: mark cl_cb_state as NFSD4_CB_DOWN if cl_cb_client is NULL Dai Ngo
2024-04-23 13:41   ` Chuck Lever
2024-04-23 17:49     ` Dai Ngo
2024-04-23 18:08       ` Chuck Lever
2024-04-23 20:13         ` Dai Ngo [this message]
2024-04-23  3:12 ` [PATCH v2 2/3] NFSD: add helper to set NFSD4_CLIENT_CB_KILL to stop the callback Dai Ngo
2024-04-23  3:12 ` [PATCH v2 3/3] NFSD: drop TCP connections when NFSv4 client enters courtesy state Dai Ngo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89688b35-f6dc-4103-8a13-ae4fe2865e19@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).