All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: fix crash of compressed writes
@ 2013-09-30 12:39 Liu Bo
  2013-09-30 17:02 ` Josef Bacik
  0 siblings, 1 reply; 5+ messages in thread
From: Liu Bo @ 2013-09-30 12:39 UTC (permalink / raw
  To: linux-btrfs

The crash[1] is found by xfstests/generic/208 with "-o compress",
it's not reproduced everytime, but it does panic.

The bug is quite interesting, it's actually introduced by a recent commit
(573aecafca1cf7a974231b759197a1aebcf39c2a,
Btrfs: actually limit the size of delalloc range).

Btrfs implements delay allocation, so during writeback, we
(1) get a page A and lock it
(2) search the state tree for delalloc bytes and lock all pages within the range
(3) process the delalloc range, including find disk space and create
    ordered extent and so on.
(4) submit the page A.

It runs well in normal cases, but if we're in a racy case, eg.
buffered compressed writes and aio-dio writes,
sometimes we may fail to lock all pages in the 'delalloc' range,
in which case, we need to fall back to search the state tree again with
a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).

The mentioned commit has a side effect, that is, in the fallback case,
we can find delalloc bytes before the index of the page we already have locked,
so we're in the case of (delalloc_end <= *start) and return with (found > 0).

This ends with not locking delalloc pages but making ->writepage still
process them, and the crash happens.

This fixes it by not enforcing the 'max_bytes' limit in the special fallback
case.

[1]:
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2170!
[...]
CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G           O 3.11.0+ #8
[...]
RIP: 0010:[<ffffffff810f5093>]  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[...]
[ 4934.248731] Stack:
[ 4934.248731]  ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
[ 4934.248731]  0000000000000000 0000000000000000 0000000000000fff 0000000000000620
[ 4934.248731]  ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
[ 4934.248731] Call Trace:
[ 4934.248731]  [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
[ 4934.248731]  [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
[ 4934.248731]  [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
[ 4934.248731]  [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
[ 4934.248731]  [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
[ 4934.248731]  [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[ 4934.248731]  [<ffffffff810608f5>] kthread+0x8d/0x95
[ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731]  [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
[ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
[ 4934.248731] RIP  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[ 4934.248731]  RSP <ffff8801869f9c48>
[ 4934.280307] ---[ end trace 36f06d3f8750236a ]---

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_io.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c09a40d..76a977e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1483,7 +1483,12 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree,
 		node = rb_next(node);
 		total_bytes += state->end - state->start + 1;
 		if (total_bytes >= max_bytes) {
-			*end = *start + max_bytes - 1;
+			/*
+			 * tell if we fall back from the failure of
+			 * lock_delalloc_pages()
+			 */
+			if (max_bytes > PAGE_CACHE_SIZE)
+				*end = *start + max_bytes - 1;
 			break;
 		}
 		if (!node)
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Btrfs: fix crash of compressed writes
  2013-09-30 12:39 [PATCH] Btrfs: fix crash of compressed writes Liu Bo
@ 2013-09-30 17:02 ` Josef Bacik
  2013-10-01  3:37   ` Liu Bo
  0 siblings, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2013-09-30 17:02 UTC (permalink / raw
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Sep 30, 2013 at 08:39:57PM +0800, Liu Bo wrote:
> The crash[1] is found by xfstests/generic/208 with "-o compress",
> it's not reproduced everytime, but it does panic.
> 
> The bug is quite interesting, it's actually introduced by a recent commit
> (573aecafca1cf7a974231b759197a1aebcf39c2a,
> Btrfs: actually limit the size of delalloc range).
> 
> Btrfs implements delay allocation, so during writeback, we
> (1) get a page A and lock it
> (2) search the state tree for delalloc bytes and lock all pages within the range
> (3) process the delalloc range, including find disk space and create
>     ordered extent and so on.
> (4) submit the page A.
> 
> It runs well in normal cases, but if we're in a racy case, eg.
> buffered compressed writes and aio-dio writes,
> sometimes we may fail to lock all pages in the 'delalloc' range,
> in which case, we need to fall back to search the state tree again with
> a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).
> 
> The mentioned commit has a side effect, that is, in the fallback case,
> we can find delalloc bytes before the index of the page we already have locked,
> so we're in the case of (delalloc_end <= *start) and return with (found > 0).
> 
> This ends with not locking delalloc pages but making ->writepage still
> process them, and the crash happens.
> 
> This fixes it by not enforcing the 'max_bytes' limit in the special fallback
> case.
> 

Great analysis, thank you for that, however I don't like the fix ;).  Instead We
need to change the

if (!found || delalloc_end <= *start)

to always return 0 since we should not call fill delalloc if the delalloc range
doesn't start at our current offset.  Secondly the

max_bytes = PAGE_CACHE_SIZE - offset;

is completely crap since we are talking about the entire page/sector.  We should
simply set max_bytes to sectorszie or PAGE_CACHE_SIZE that way we avoid this
issue.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Btrfs: fix crash of compressed writes
  2013-09-30 17:02 ` Josef Bacik
@ 2013-10-01  3:37   ` Liu Bo
  2013-10-01 12:38     ` Josef Bacik
  0 siblings, 1 reply; 5+ messages in thread
From: Liu Bo @ 2013-10-01  3:37 UTC (permalink / raw
  To: Josef Bacik; +Cc: linux-btrfs

On Mon, Sep 30, 2013 at 01:02:49PM -0400, Josef Bacik wrote:
> On Mon, Sep 30, 2013 at 08:39:57PM +0800, Liu Bo wrote:
> > The crash[1] is found by xfstests/generic/208 with "-o compress",
> > it's not reproduced everytime, but it does panic.
> > 
> > The bug is quite interesting, it's actually introduced by a recent commit
> > (573aecafca1cf7a974231b759197a1aebcf39c2a,
> > Btrfs: actually limit the size of delalloc range).
> > 
> > Btrfs implements delay allocation, so during writeback, we
> > (1) get a page A and lock it
> > (2) search the state tree for delalloc bytes and lock all pages within the range
> > (3) process the delalloc range, including find disk space and create
> >     ordered extent and so on.
> > (4) submit the page A.
> > 
> > It runs well in normal cases, but if we're in a racy case, eg.
> > buffered compressed writes and aio-dio writes,
> > sometimes we may fail to lock all pages in the 'delalloc' range,
> > in which case, we need to fall back to search the state tree again with
> > a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).
> > 
> > The mentioned commit has a side effect, that is, in the fallback case,
> > we can find delalloc bytes before the index of the page we already have locked,
> > so we're in the case of (delalloc_end <= *start) and return with (found > 0).
> > 
> > This ends with not locking delalloc pages but making ->writepage still
> > process them, and the crash happens.
> > 
> > This fixes it by not enforcing the 'max_bytes' limit in the special fallback
> > case.
> > 
> 
> Great analysis, thank you for that, however I don't like the fix ;).  Instead We
> need to change the
> 
> if (!found || delalloc_end <= *start)
> 
> to always return 0 since we should not call fill delalloc if the delalloc range
> doesn't start at our current offset.  Secondly the

Setting 'found = 0' is also the first idea into my mind :), but I'm not sure if
it's right.

I've check it out, commit 70b99e6959a4c28ae1b314985eca731f3db72f1d
(Btrfs: Compression corner fixes,
Signed-off-by: Chris Mason <chris.mason@oracle.com>)
introduced the (delalloc_end <= *start).

The commit log says, 
"Make sure we lock pages in the correct order during delalloc.  The
 search into the state tree for delalloc bytes can return bytes
 before the page we already have locked."

But I think if we find bytes before the page we locked, then we
should get (found = 0).

So I don't know why we need the check (delalloc_end <= *start),
maybe some corner cases need that? Chris can explain a bit?

> 
> max_bytes = PAGE_CACHE_SIZE - offset;
> 
> is completely crap since we are talking about the entire page/sector.  We should
> simply set max_bytes to sectorszie or PAGE_CACHE_SIZE that way we avoid this
> issue.  Thanks,

This might be due to historical reasons, but for now, it doesn't cause
problems as the passed @start is page aligned and @offset is always 0.

thanks,
-liubo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Btrfs: fix crash of compressed writes
  2013-10-01  3:37   ` Liu Bo
@ 2013-10-01 12:38     ` Josef Bacik
  2013-10-01 15:42       ` Liu Bo
  0 siblings, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2013-10-01 12:38 UTC (permalink / raw
  To: Liu Bo; +Cc: Josef Bacik, linux-btrfs

On Tue, Oct 01, 2013 at 11:37:22AM +0800, Liu Bo wrote:
> On Mon, Sep 30, 2013 at 01:02:49PM -0400, Josef Bacik wrote:
> > On Mon, Sep 30, 2013 at 08:39:57PM +0800, Liu Bo wrote:
> > > The crash[1] is found by xfstests/generic/208 with "-o compress",
> > > it's not reproduced everytime, but it does panic.
> > > 
> > > The bug is quite interesting, it's actually introduced by a recent commit
> > > (573aecafca1cf7a974231b759197a1aebcf39c2a,
> > > Btrfs: actually limit the size of delalloc range).
> > > 
> > > Btrfs implements delay allocation, so during writeback, we
> > > (1) get a page A and lock it
> > > (2) search the state tree for delalloc bytes and lock all pages within the range
> > > (3) process the delalloc range, including find disk space and create
> > >     ordered extent and so on.
> > > (4) submit the page A.
> > > 
> > > It runs well in normal cases, but if we're in a racy case, eg.
> > > buffered compressed writes and aio-dio writes,
> > > sometimes we may fail to lock all pages in the 'delalloc' range,
> > > in which case, we need to fall back to search the state tree again with
> > > a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).
> > > 
> > > The mentioned commit has a side effect, that is, in the fallback case,
> > > we can find delalloc bytes before the index of the page we already have locked,
> > > so we're in the case of (delalloc_end <= *start) and return with (found > 0).
> > > 
> > > This ends with not locking delalloc pages but making ->writepage still
> > > process them, and the crash happens.
> > > 
> > > This fixes it by not enforcing the 'max_bytes' limit in the special fallback
> > > case.
> > > 
> > 
> > Great analysis, thank you for that, however I don't like the fix ;).  Instead We
> > need to change the
> > 
> > if (!found || delalloc_end <= *start)
> > 
> > to always return 0 since we should not call fill delalloc if the delalloc range
> > doesn't start at our current offset.  Secondly the
> 
> Setting 'found = 0' is also the first idea into my mind :), but I'm not sure if
> it's right.
> 
> I've check it out, commit 70b99e6959a4c28ae1b314985eca731f3db72f1d
> (Btrfs: Compression corner fixes,
> Signed-off-by: Chris Mason <chris.mason@oracle.com>)
> introduced the (delalloc_end <= *start).
> 
> The commit log says, 
> "Make sure we lock pages in the correct order during delalloc.  The
>  search into the state tree for delalloc bytes can return bytes
>  before the page we already have locked."
> 
> But I think if we find bytes before the page we locked, then we
> should get (found = 0).
> 
> So I don't know why we need the check (delalloc_end <= *start),
> maybe some corner cases need that? Chris can explain a bit?
> 

No we needed his other addition below which is important, for when the delalloc
range starts before but still encompasses our page.  The other part is not
needed, we should be returning 0 if delalloc_end lands before or on our start,
which should fix your bug and make it a bit more sane.

> > 
> > max_bytes = PAGE_CACHE_SIZE - offset;
> > 
> > is completely crap since we are talking about the entire page/sector.  We should
> > simply set max_bytes to sectorszie or PAGE_CACHE_SIZE that way we avoid this
> > issue.  Thanks,
> 
> This might be due to historical reasons, but for now, it doesn't cause
> problems as the passed @start is page aligned and @offset is always 0.
> 

Yeah you are right, still I'd kill it anyway since it doesn't make sense.
Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Btrfs: fix crash of compressed writes
  2013-10-01 12:38     ` Josef Bacik
@ 2013-10-01 15:42       ` Liu Bo
  0 siblings, 0 replies; 5+ messages in thread
From: Liu Bo @ 2013-10-01 15:42 UTC (permalink / raw
  To: Josef Bacik; +Cc: linux-btrfs

On Tue, Oct 01, 2013 at 08:38:30AM -0400, Josef Bacik wrote:
> No we needed his other addition below which is important, for when the delalloc
> range starts before but still encompasses our page.  The other part is not
> needed, we should be returning 0 if delalloc_end lands before or on our start,
> which should fix your bug and make it a bit more sane.

Got it, I'm sending a new version.

-liubo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-10-01 15:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-30 12:39 [PATCH] Btrfs: fix crash of compressed writes Liu Bo
2013-09-30 17:02 ` Josef Bacik
2013-10-01  3:37   ` Liu Bo
2013-10-01 12:38     ` Josef Bacik
2013-10-01 15:42       ` Liu Bo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.