Re: scrub: last_physical not always updated when scrub is cancelled

Linux-BTRFS Archive mirror
 help / color / mirror / Atom feed

From: Michel Palleau <michel.palleau@gmail.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Graham Cobb <g.btrfs@cobb.uk.net>, linux-btrfs@vger.kernel.org
Subject: Re: scrub: last_physical not always updated when scrub is cancelled
Date: Sun, 7 Apr 2024 15:23:53 +0200	[thread overview]
Message-ID: <CAMFk-+hBFke3_AngYmk6+hE=bF3xuR4wF+PJv+zA=3MtCN6V_g@mail.gmail.com> (raw)
In-Reply-To: <CAMFk-+g5ztbJDPw4bDo5Bo3Z8mPstZpXSB9n_UwP+sGbSGwDAQ@mail.gmail.com>

Hello Qu,

Do you know in which kernel release the fix is or will be integrated ?
Thank you.

Best Regards,
Michel

Le dim. 10 mars 2024 à 20:43, Michel Palleau
<michel.palleau@gmail.com> a écrit :
>
> That's perfectly fine for me.
> I was just testing corner cases and reporting my findings.
> Indeed, cancelling or even getting the progress right after scrub
> starting is not a use case.
>
> Thank you for the fast support.
> Michel
>
>
> Le sam. 9 mars 2024 à 21:31, Qu Wenruo <quwenruo.btrfs@gmx.com> a écrit :
> >
> >
> >
> > 在 2024/3/10 05:56, Michel Palleau 写道:
> > > Hello Qu,
> > >
> > > I have tested your patches today, with
> > > - btrfs scrub status | fgrep last_phys in one console, every second
> > > - btrfs scrub start/resume in another
> > >
> > > last_phys is now increasing steadily (while before it was remaining
> > > constant for several minutes).
> > >
> > > But there is still a small window after resuming a scrub where the
> > > value reported by scrub status is not consistent.
> > > Here is the output of btrfs scrub status | fgrep last_phys every second:
> > >         last_physical: 0
> > >         last_physical: 80805888
> > >         last_physical: 1986068480
> > > ...
> > >         last_physical: 50135564288
> > >         last_physical: 52316602368
> > >         last_physical: 52769587200
> > > ... (scrub has been cancelled)
> > >         last_physical: 52769587200
> > >         last_physical: 52719255552  <-- reported value is smaller than before
> >
> > IIRC restarted scrub doesn't fully follow the start/end parameter passed
> > in, mostly due to our current scrub code is fully chunk based.
> >
> > This means, even if we updated our last_physical correctly on a
> > per-stripe basis, after resuming a canceled one, we would still restart
> > at the last chunk, not the last extent.
> >
> > To change this behavior, it would require some extra work.
> >
> > >         last_physical: 54866739200
> > >         last_physical: 57014222848
> > > ...
> > >         last_physical: 74621911040
> > >         last_physical: 76844892160
> > >         last_physical: 77566312448
> > > ... (scrub has been cancelled)
> > >         last_physical: 77566312448
> > >         last_physical: 0            <-- reported value is 0, scrub
> > > process has not updated last_phys field yet
> > >         last_physical: 79562801152
> > >         last_physical: 81819336704
> > >
> > > I think a smaller last_physical indicates that the resume operation
> > > has not started exactly from last_physical, but somewhere before.
> > > It can be a little surprising, but not a big deal.
> >
> > Yes, the resume would only start at the beginning of the last chunk.
> >
> > > last_physical: 0 indicates that scrub has not yet written last_phys.
> > >
> > > Then I chained scrub resume and scrub cancel. I saw once that
> > > last_physical was getting smaller than before.
> > > But I never saw a reset of last_physical. It looks like last_phys is
> > > always written before exiting the scrub operation.
> > >
> > > To fix progress reporting a last_physical at 0, I propose the following change:
> > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> > > index 9a39f33dc..a43dcfd6a 100644
> > > --- a/fs/btrfs/scrub.c
> > > +++ b/fs/btrfs/scrub.c
> > > @@ -2910,6 +2910,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info
> > > *fs_info, u64 devid, u64 start,
> > >     sctx = scrub_setup_ctx(fs_info, is_dev_replace);
> > >     if (IS_ERR(sctx))
> > >         return PTR_ERR(sctx);
> > > +    sctx->stat.last_physical = start;
> > >
> > >     ret = scrub_workers_get(fs_info);
> > >     if (ret)
> >
> > The snippet looks mostly fine, but as I explained above, the scrub
> > progress always restarts at the last chunk, thus even if we reset the
> > last_physical here, it would soon be reset to a bytenr smaller than
> > @start, and can be a little confusing.
> >
> > Although I don't really have any better idea on the value to set.
> >
> > Maybe @last_physical set to 0 for a small window won't be that bad?
> > (Indicating scrub has not yet scrubbed any content?)
> >
> > Thanks,
> > Qu
> > >
> > > Best Regards,
> > > Michel
> > >
> > > Cordialement,
> > > Michel Palleau
> > >
> > >
> > > Le ven. 8 mars 2024 à 10:45, Graham Cobb <g.btrfs@cobb.uk.net> a écrit :
> > >>
> > >> By the way, I have noticed this problem for some time, but haven't got
> > >> around to analysing it, sorry. I had actually assumed it was a race
> > >> condition in the user mode processing of cancelling/resuming scrubs.
> > >>
> > >> In my case, I do regular scrubs of several disks. However, this is very
> > >> intrusive to the overall system performance and so I have scripts which
> > >> suspend critical processing which causes problems if it times out (such
> > >> as inbound mail handling) during the scrub. I suspend these processes,
> > >> run the scrub for a short while, then cancel the scrub and run the mail
> > >> for a while, then back to suspending the mail and resuming the scrub.
> > >> Typically it means scrubs on the main system and backup disks take
> > >> several days and get cancelled and resumed *many* times.
> > >>
> > >> This has worked for many years - until recently-ish (some months ago),
> > >> when I noticed that scrub was losing track of where it had got to. It
> > >> was jumping backwards, or even, in some cases, setting last_physical
> > >> back to 0 and starting all over again!!
> > >>
> > >> I haven't had time to track it down - I just hacked the scripts to
> > >> terminate if it happened. Better to have the scrub not complete than to
> > >> hobble performance forever!
> > >>
> > >> If anyone wants to try the scripts they are in
> > >> https://github.com/GrahamCobb/btrfs-balance-slowly (see
> > >> btrfs-scrub-slowly). A typical invocation looks like:
> > >>
> > >> /usr/local/sbin/btrfs-scrub-slowly --debug --portion-time $((10*60))
> > >> --interval $((5*60)) --hook hook-nomail /mnt/data/
> > >>
> > >> As this script seem to be able to reproduce the problem fairly reliably
> > >> (although after several hours - the filesystems I use this for range
> > >> from 7TB to 17TB and each take 2-3 days to fully scrub with this script)
> > >> they may be useful to someone else. Unfortunately I do not expect to be
> > >> able to build a kernel to test the proposed fix myself in the next
> > >> couple of weeks.
> > >>
> > >> Graham
> > >>
> > >>
> > >> On 08/03/2024 00:26, Qu Wenruo wrote:
> > >>>
> > >>>
> > >>> 在 2024/3/8 07:07, Michel Palleau 写道:
> > >>>> Hello everyone,
> > >>>>
> > >>>> While playing with the scrub operation, using cancel and resume (with
> > >>>> btrfs-progs), I saw that my scrub operation was taking much more time
> > >>>> than expected.
> > >>>> Analyzing deeper, I think I found an issue on the kernel side, in the
> > >>>> update of last_physical field.
> > >>>>
> > >>>> I am running a 6.7.5 kernel (ArchLinux: 6.7.5-arch1-1), with a basic
> > >>>> btrfs (single device, 640 GiB used out of 922 GiB, SSD).
> > >>>>
> > >>>> Error scenario:
> > >>>> - I start a scrub, monitor it with scrub status and when I see no
> > >>>> progress in the last_physical field (likely because it is scrubbing a
> > >>>> big chunk), I cancel the scrub,
> > >>>> - then I resume the scrub operation: if I do a scrub status,
> > >>>> last_physical is 0. If I do a scrub cancel, last_physical is still 0.
> > >>>> The state file saves 0, and so next resume will start from the very
> > >>>> beginning. Progress has been lost!
> > >>>>
> > >>>> Note that for my fs, if I do not cancel it, I can see the
> > >>>> last_physical field remaining constant for more than 3 minutes, while
> > >>>> the data_bytes_scrubbed is increasing fastly. The complete scrub needs
> > >>>> less than 10 min.
> > >>>>
> > >>>> I have put at the bottom the outputs of the start/resume commands as
> > >>>> well as the scrub.status file after each operation.
> > >>>>
> > >>>> Looking at kernel code, last_physical seems to be rarely updated. And
> > >>>> in case of scrub cancel, the current position is not written into
> > >>>> last_physical, so the value remains the last written value. Which can
> > >>>> be 0 if it has not been written since the scrub has been resumed.
> > >>>>
> > >>>> I see 2 problems here:
> > >>>> 1. when resuming a scrub, the returned last_physical shall be at least
> > >>>> equal to the start position, so that the scrub operation is not doing
> > >>>> a step backward,
> > >>>> 2. on cancel, the returned last_physical shall be as near as possible
> > >>>> to the current scrub position, so that the resume operation is not
> > >>>> redoing the same operations again. Several minutes without an update
> > >>>> is a waste.
> > >>>>
> > >>>> Pb 1 is pretty easy to fix: in btrfs_scrub_dev(), fill the
> > >>>> last_physical field with the start parameter after initialization of
> > >>>> the context.
> > >>>
> > >>> Indeed, we're only updating last_physical way too infrequently.
> > >>>
> > >>>> Pb 2 looks more difficult: updating last_physical more often implies
> > >>>> the capability to resume from this position.
> > >>>
> > >>> The truth is, every time we finished a stripe, we should update
> > >>> last_physical, so that in resume case, we would waste at most a stripe
> > >>> (64K), which should be minimal compared to the size of the fs.
> > >>>
> > >>> This is not hard to do inside flush_scrub_stripes() for non-RAID56
> > >>> profiles.
> > >>>
> > >>> It may needs a slightly more handling for RAID56, but overall I believe
> > >>> it can be done.
> > >>>
> > >>> Let me craft a patch for you to test soon.
> > >>>
> > >>> Thanks,
> > >>> Qu
> > >>>
> > >>>
> > >>>>
> > >>>> Here are output of the different steps:
> > >>>>
> > >>>> # btrfs scrub start -BR /mnt/clonux_btrfs
> > >>>> Starting scrub on devid 1
> > >>>> scrub canceled for 4c61ff6d-a903-42f6-b490-a3ce3690604e
> > >>>> Scrub started:    Thu Mar  7 17:11:17 2024
> > >>>> Status:           aborted
> > >>>> Duration:         0:00:22
> > >>>>           data_extents_scrubbed: 1392059
> > >>>>           tree_extents_scrubbed: 57626
> > >>>>           data_bytes_scrubbed: 44623339520
> > >>>>           tree_bytes_scrubbed: 944144384
> > >>>>           read_errors: 0
> > >>>>           csum_errors: 0
> > >>>>           verify_errors: 0
> > >>>>           no_csum: 1853
> > >>>>           csum_discards: 0
> > >>>>           super_errors: 0
> > >>>>           malloc_errors: 0
> > >>>>           uncorrectable_errors: 0
> > >>>>           unverified_errors: 0
> > >>>>           corrected_errors: 0
> > >>>>           last_physical: 36529242112
> > >>>>
> > >>>> # cat scrub.status.4c61ff6d-a903-42f6-b490-a3ce3690604e
> > >>>> scrub status:1
> > >>>> 4c61ff6d-a903-42f6-b490-a3ce3690604e:1|data_extents_scrubbed:1392059|tree_extents_scrubbed:57626|data_bytes_scrubbed:44623339520|tree_bytes_scrubbed:944144384|read_errors:0|csum_errors:0|verify_errors:0|no_csum:1853|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:36529242112|t_start:1709827877|t_resumed:0|duration:22|canceled:1|finished:1
> > >>>>
> > >>>> # btrfs scrub resume -BR /mnt/clonux_btrfs
> > >>>> Starting scrub on devid 1
> > >>>> scrub canceled for 4c61ff6d-a903-42f6-b490-a3ce3690604e
> > >>>> Scrub started:    Thu Mar  7 17:13:07 2024
> > >>>> Status:           aborted
> > >>>> Duration:         0:00:07
> > >>>>           data_extents_scrubbed: 250206
> > >>>>           tree_extents_scrubbed: 0
> > >>>>           data_bytes_scrubbed: 14311002112
> > >>>>           tree_bytes_scrubbed: 0
> > >>>>           read_errors: 0
> > >>>>           csum_errors: 0
> > >>>>           verify_errors: 0
> > >>>>           no_csum: 591
> > >>>>           csum_discards: 0
> > >>>>           super_errors: 0
> > >>>>           malloc_errors: 0
> > >>>>           uncorrectable_errors: 0
> > >>>>           unverified_errors: 0
> > >>>>           corrected_errors: 0
> > >>>>           last_physical: 0
> > >>>>
> > >>>> # cat scrub.status.4c61ff6d-a903-42f6-b490-a3ce3690604e
> > >>>> scrub status:1
> > >>>> 4c61ff6d-a903-42f6-b490-a3ce3690604e:1|data_extents_scrubbed:1642265|tree_extents_scrubbed:57626|data_bytes_scrubbed:58934341632|tree_bytes_scrubbed:944144384|read_errors:0|csum_errors:0|verify_errors:0|no_csum:2444|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:0|t_start:1709827877|t_resumed:1709827987|duration:29|canceled:1|finished:1
> > >>>>
> > >>>> Best Regards,
> > >>>> Michel Palleau
> > >>>>
> > >>>
> > >>

     prev parent reply	other threads:[~2024-04-07 13:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-07 20:37 scrub: last_physical not always updated when scrub is cancelled Michel Palleau
2024-03-08  0:26 ` Qu Wenruo
2024-03-08  9:45   ` Graham Cobb
2024-03-09 19:26     ` Michel Palleau
2024-03-09 20:31       ` Qu Wenruo
2024-03-10 19:43         ` Michel Palleau
2024-04-07 13:23           ` Michel Palleau [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMFk-+hBFke3_AngYmk6+hE=bF3xuR4wF+PJv+zA=3MtCN6V_g@mail.gmail.com' \
    --to=michel.palleau@gmail.com \
    --cc=g.btrfs@cobb.uk.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).