Re: [Bug Report] Discard bios cannot be correctly merged in blk-mq

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Xiao Ni <xni@redhat.com>
To: Wang Shanker <shankerwangmiao@gmail.com>
Cc: Ming Lei <ming.lei@redhat.com>,
	linux-block@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: [Bug Report] Discard bios cannot be correctly merged in blk-mq
Date: Wed, 9 Jun 2021 16:44:07 +0800	[thread overview]
Message-ID: <CALTww28L7afRdVdBf-KsyF6Hvf-8-CORSCpZJAvnVbDRo6chDQ@mail.gmail.com> (raw)
In-Reply-To: <1C6DB607-B7BE-4257-8384-427BB490C9C0@gmail.com>

Hi all

Thanks for reporting about this. I did a test in my environment.
time blkdiscard /dev/nvme5n1  (477GB)
real    0m0.398s
time blkdiscard /dev/md0
real    9m16.569s

I'm not familiar with the block layer codes. I'll try to understand
the codes related with discard request and
try to fix this problem.

I have a question for raid5 discard, it needs to consider more than
raid0 and raid10. For example, there is a raid5 with 3 disks.
D11 D21 P1 (stripe size is 4KB)
D12 D22 P2
D13 D23 P3
D14 D24 P4
...  (chunk size is 512KB)
If there is a discard request on D13 and D14, and there is no discard
request on D23 D24. It can't send
discard request to D13 and D14, right? P3 = D23 xor D13. If we discard
D13 and disk2 is broken, it can't
get the right data from D13 and P3. The discard request on D13 can
write 0 to the discard region, right?

If so, it can handle a discard bio at a time that is big enough at
least to contain the data. (data disks * chunk size). In this case the
size is 1024KB (512KB*2).

Regards
Xiao


On Wed, Jun 9, 2021 at 10:40 AM Wang Shanker <shankerwangmiao@gmail.com> wrote:
>
>
> > 2021年06月09日 08:41，Ming Lei <ming.lei@redhat.com> 写道：
> >
> > On Tue, Jun 08, 2021 at 11:49:04PM +0800, Wang Shanker wrote:
> >>
> >>
> >> Actually, what are received by the nvme controller are discard requests
> >> with 128 segments of 4k, instead of one segment of 512k.
> >
> > Right, I am just wondering if this way makes a difference wrt. single
> > range/segment discard request from device viewpoint, but anyway it is
> > better to send less segment.
> It would be meaningful if more than queue_max_discard_segments() bio's
> are sent and merged into big segments.
> >
> >>
> >>>
> >>>>
> >>>> Similarly, the problem with scsi devices can be emulated using the following
> >>>> options for qemu:
> >>>>
> >>>>       -device virtio-scsi,id=scsi \
> >>>>       -device scsi-hd,drive=nvme1,bus=scsi.0,logical_block_size=4096,discard_granularity=2097152,physical_block_size=4096,serial=NVME1 \
> >>>>       -device scsi-hd,drive=nvme2,bus=scsi.0,logical_block_size=4096,discard_granularity=2097152,physical_block_size=4096,serial=NVME2 \
> >>>>       -device scsi-hd,drive=nvme3,bus=scsi.0,logical_block_size=4096,discard_granularity=2097152,physical_block_size=4096,serial=NVME3 \
> >>>>       -trace scsi_disk_emulate_command_UNMAP,file=scsitrace.log
> >>>>
> >>>>
> >>>> Despite the discovery, I cannot come up with a proper fix of this issue due
> >>>> to my lack of familiarity of the block subsystem. I expect your kind feedback
> >>>> on this. Thanks in advance.
> >>>
> >>> In the above setting and raid456 test, I observe that rq->nr_phys_segments can
> >>> reach 128, but queue_max_discard_segments() reports 256. So discard
> >>> request size can be 512KB, which is the max size when you run 1MB discard on
> >>> raid456. However, if the discard length on raid456 is increased, the
> >>> current way will become inefficient.
> >>
> >> Exactly.
> >>
> >> I suggest that bio's can be merged and be calculated as one segment if they are
> >> contiguous and contain no data.
> >
> > Fine.
> >
> >>
> >> And I also discovered later that, even normal long write requests, e.g.
> >> a 10m write, will be split into 4k bio's. The maximum number of bio's which can
> >> be merged into one request is limited by queue_max_segments, regardless
> >> of whether those bio's are contiguous. In my test environment, for scsi devices,
> >> queue_max_segments can be 254, which means about 1m size of requests. For nvme
> >> devices(e.g. Intel DC P4610), queue_max_segments is only 33 since their mdts is 5,
> >> which results in only 132k of requests.
> >
> > Here what matters is queue_max_discard_segments().
> Here I was considering normal write/read bio's, since I first took it for granted
> that normal write/read IOs would be optimal in raid456, and finally discovered
> that those 4k IOs can only be merged into not-so-big requests.
> >
> >>
> >> So, I would also suggest that raid456 should be improved to issue bigger bio's to
> >> underlying drives.
> >
> > Right, that should be root solution.
> >
> > Cc Xiao, I remembered that he worked on this area.
>
> Many thanks for looking into this issue.
>
> Cheers,
>
> Miao Wang
>

next prev parent reply	other threads:[~2021-06-09  8:44 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-05 20:54 [Bug Report] Discard bios cannot be correctly merged in blk-mq Wang Shanker
2021-06-05 22:38 ` antlists
2021-06-06  3:44   ` Wang Shanker
2021-06-07 13:07 ` Ming Lei
2021-06-08 15:49   ` Wang Shanker
2021-06-09  0:41     ` Ming Lei
2021-06-09  2:40       ` Wang Shanker
2021-06-09  8:44         ` Xiao Ni [this message]
2021-06-09  9:03           ` Wang Shanker
2021-06-18  6:28             ` Wang Shanker
2021-06-18 12:49               ` Xiao Ni
2021-06-21  7:49                 ` Wang Shanker
2021-06-22  1:48                   ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALTww28L7afRdVdBf-KsyF6Hvf-8-CORSCpZJAvnVbDRo6chDQ@mail.gmail.com \
    --to=xni@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=shankerwangmiao@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.