crash in rbd_img_request

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* crash in rbd_img_request_create
@ 2014-05-10 22:18 Hannes Landeholm
  2014-05-11  3:11 ` Alex Elder
  0 siblings, 1 reply; 11+ messages in thread
From: Hannes Landeholm @ 2014-05-10 22:18 UTC (permalink / raw
  To: Ceph Development, Alex Elder, Ilya Dryomov; +Cc: Thorwald Lundqvist

Hello,

I have a development machine that I have been running stress tests on
for a week as I'm trying to reproduce some hard to reproduce failures.
I've mentioned the same machine previously in the thread "rbd unmap
deadlock". I just now noticed that some processes had completely
stalled. I looked in the system log and saw this crash about 9 hours
ago:

kernel: BUG: unable to handle kernel paging request at ffff87ff3fbcdc58
kernel: IP: [<ffffffffa0357203>] rbd_img_request_fill+0x123/0x6d0 [rbd]
kernel: PGD 0
kernel: Oops: 0000 [#1] PREEMPT SMP
kernel: Modules linked in: xt_recent xt_conntrack ipt_REJECT xt_limit
xt_tcpudp iptable_filter veth ipt_MASQUERADE iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
ip_tables x_tables cbc bridge stp llc coretemp x86_pkg_temp_thermal
intel_powerclamp kvm_intel kvm cr
kernel:  crc32c libcrc32c ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom
crc_t10dif crct10dif_common atkbd libps2 ahci libahci libata ehci_pci
xhci_hcd ehci_hcd scsi_mod usbcore usb_common i8042 serio
kernel: CPU: 4 PID: 3015 Comm: mysqld Tainted: P           O 3.14.1-1-js #1
kernel: Hardware name: ASUSTeK COMPUTER INC. RS100-E8-PI2/P9D-M
Series, BIOS 0302 05/10/2013
kernel: task: ffff88003f046220 ti: ffff88011d3d2000 task.ti: ffff88011d3d2000
kernel: RIP: 0010:[<ffffffffa0357203>]  [<ffffffffa0357203>]
rbd_img_request_fill+0x123/0x6d0 [rbd]
kernel: RSP: 0018:ffff88011d3d3ac0  EFLAGS: 00010286
kernel: RAX: ffff87ff3fbcdc00 RBX: 0000000008814000 RCX: 00000000011bcf84
kernel: RDX: ffffffffa035c867 RSI: 0000000000000065 RDI: ffff8800b338f000
kernel: RBP: ffff88011d3d3b78 R08: 000000000001abe0 R09: ffffffffa03571e0
kernel: R10: 772d736a2f73656e R11: 6e61682d637a762f R12: ffff8800b338f000
kernel: R13: ffff88025609d100 R14: 0000000000000000 R15: 0000000000000001
kernel: FS:  00007fffe17fb700(0000) GS:ffff88042fd00000(0000)
knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: ffff87ff3fbcdc58 CR3: 0000000126e0e000 CR4: 00000000001407e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Stack:
kernel:  ffff880128ad0d98 0000000000000000 000022011d3d3bb8 ffff87ff3fbcdc20
kernel:  ffff87ff3fbcdcc8 ffff8803b6459c90 682d637a762fea80 0000000000000001
kernel:  0000000000000000 ffff87ff3fbcdc00 ffff8803b6459c30 0000000000004000
kernel: Call Trace:
kernel:  [<ffffffffa03554d5>] ? rbd_img_request_create+0x155/0x220 [rbd]
kernel:  [<ffffffff8125cab9>] ? blk_add_timer+0x19/0x20
kernel:  [<ffffffffa035aa1d>] rbd_request_fn+0x1ed/0x330 [rbd]
kernel:  [<ffffffff81252f13>] __blk_run_queue+0x33/0x40
kernel:  [<ffffffff8127a4dd>] cfq_insert_request+0x34d/0x560
kernel:  [<ffffffff8124fa1c>] __elv_add_request+0x1bc/0x300
kernel:  [<ffffffff81256cd0>] blk_flush_plug_list+0x1d0/0x230
kernel:  [<ffffffff812570a4>] blk_finish_plug+0x14/0x40
kernel:  [<ffffffffa027fd6e>] ext4_writepages+0x48e/0xd50 [ext4]
kernel:  [<ffffffff811417ae>] do_writepages+0x1e/0x40
kernel:  [<ffffffff811363d9>] __filemap_fdatawrite_range+0x59/0x60
kernel:  [<ffffffff811364da>] filemap_write_and_wait_range+0x2a/0x70
kernel:  [<ffffffffa027749a>] ext4_sync_file+0xba/0x360 [ext4]
kernel:  [<ffffffff811d50ce>] do_fsync+0x4e/0x80
kernel:  [<ffffffff811d5350>] SyS_fsync+0x10/0x20
kernel:  [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b
kernel: Code: 00 00 00 e8 a0 25 e3 e0 48 85 c0 49 89 c4 0f 84 0c 04 00
00 48 8b 45 90 48 8b 5d b0 48 c7 c2 67 c8 35 a0 be 65 00 00 00 4c 89
e7 <0f> b6 48 58 48 d3 eb 83 78 18 02 48 89 c1 48 8b 49 50 48 c7 c0
kernel: RIP  [<ffffffffa0357203>] rbd_img_request_fill+0x123/0x6d0 [rbd]
kernel:  RSP <ffff88011d3d3ac0>
kernel: CR2: ffff87ff3fbcdc58
kernel: ---[ end trace bebc1d7ea3182129 ]---

uname: Linux localhost 3.14.1-1-js #1 SMP PREEMPT Tue Apr 15 17:59:05
CEST 2014 x86_64 GNU/Linux

This is a "stock" Arch 3.14.1 kernel with no custom patches.

For some reason the rest of the system still works fine but trying to
clean up with SIGKILL makes the system full of unkillable deferred
zombie processes.

Ceph cluster looks fine, I ran a successful deep scrub as well. It
still uses the same machine but it runs a new cluster now:

    cluster 32c6af82-73ff-4ea8-9220-cd47c6976ecb
     health HEALTH_WARN
     monmap e1: 1 mons at {margarina=192.168.0.215:6789/0}, election
epoch 1, quorum 0 margarina
     osdmap e54: 2 osds: 2 up, 2 in
      pgmap v62043: 492 pgs, 6 pools, 4240 MB data, 1182 objects
            18810 MB used, 7083 GB / 7101 GB avail
                 492 active+clean

2014-05-11 00:03:00.551688 mon.0 [INF] pgmap v62043: 492 pgs: 492
active+clean; 4240 MB data, 18810 MB used, 7083 GB / 7101 GB avail

Trying to unmap the related rbd volume goes horribly wrong. "rbd
unmap" waits for a child process (wait4) with an empty cmdline that
has deadlocked with the following stack:

[<ffffffff811e83b3>] fsnotify_clear_marks_by_group_flags+0x33/0xb0
[<ffffffff811e8443>] fsnotify_clear_marks_by_group+0x13/0x20
[<ffffffff811e75c2>] fsnotify_destroy_group+0x12/0x50
[<ffffffff811e96a2>] inotify_release+0x22/0x50
[<ffffffff811a811c>] __fput+0x9c/0x220
[<ffffffff811a82ee>] ____fput+0xe/0x10
[<ffffffff810848ec>] task_work_run+0xbc/0xe0
[<ffffffff81067556>] do_exit+0x2a6/0xa70
[<ffffffff814df85b>] oops_end+0x9b/0xe0
[<ffffffff814d5f8a>] no_context+0x296/0x2a3
[<ffffffff814d601d>] __bad_area_nosemaphore+0x86/0x1dc
[<ffffffff814d6186>] bad_area_nosemaphore+0x13/0x15
[<ffffffff814e1e4e>] __do_page_fault+0x3ce/0x5a0
[<ffffffff814e2042>] do_page_fault+0x22/0x30
[<ffffffff814ded38>] page_fault+0x28/0x30
[<ffffffff811ea249>] SyS_inotify_add_watch+0x219/0x360
[<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

As before rbd likely still doesn't contain any debug symbols as we
haven't recompiled anything yet. I should really get that done. I
could double check though if that would really, really help you.

I will probably hard reboot this machine soon so I can continue my
stress tests so if you want me to pull out some other data from the
run time state you should reply immediately.

Thank you for your time,
--
Hannes Landeholm
Co-founder & CTO
Jumpstarter - www.jumpstarter.io

☎ +46 72 301 35 62
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in rbd_img_request_create
  2014-05-10 22:18 crash in rbd_img_request_create Hannes Landeholm
@ 2014-05-11  3:11 ` Alex Elder
  2014-05-11  9:33   ` Ilya Dryomov
  2014-05-11 16:33   ` Hannes Landeholm
  0 siblings, 2 replies; 11+ messages in thread
From: Alex Elder @ 2014-05-11  3:11 UTC (permalink / raw
  To: Hannes Landeholm, Ceph Development, Ilya Dryomov; +Cc: Thorwald Lundqvist

On 05/10/2014 05:18 PM, Hannes Landeholm wrote:
> Hello,
> 
> I have a development machine that I have been running stress tests on
> for a week as I'm trying to reproduce some hard to reproduce failures.
> I've mentioned the same machine previously in the thread "rbd unmap
> deadlock". I just now noticed that some processes had completely
> stalled. I looked in the system log and saw this crash about 9 hours
> ago:

Are you still running kernel rbd as a client of ceph
services running on the same physical machine?

I personally believe that scenario may be at risk of
deadlock in any case--we haven't taken great care to
avoid it in this case.

Anyway...

I can build v3.14.1 but I don't know what kernel configuration
you are using.  Knowing that could be helpful.  I built it using
a config I have though, and it's *possible* you crashed on
this line, in rbd_segment_name():
        ret = snprintf(name, CEPH_MAX_OID_NAME_LEN + 1, name_format,
                        rbd_dev->header.object_prefix, segment);
And if so, the only reason I can think that this failed is if
rbd_dev->header.object_prefix were null (or an otherwise bad
pointer value).  But at this point it's a lot of speculation.

Depending on what your stress tests were doing, I suppose it
could be that you unmapped an in-use rbd image and there was
some sort of insufficient locking.

Can you also give a little insight about what your stress
tests were doing?

Thanks.

					-Alex

> kernel: BUG: unable to handle kernel paging request at ffff87ff3fbcdc58
> kernel: IP: [<ffffffffa0357203>] rbd_img_request_fill+0x123/0x6d0 [rbd]
> kernel: PGD 0
> kernel: Oops: 0000 [#1] PREEMPT SMP
> kernel: Modules linked in: xt_recent xt_conntrack ipt_REJECT xt_limit
> xt_tcpudp iptable_filter veth ipt_MASQUERADE iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> ip_tables x_tables cbc bridge stp llc coretemp x86_pkg_temp_thermal
> intel_powerclamp kvm_intel kvm cr
> kernel:  crc32c libcrc32c ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom
> crc_t10dif crct10dif_common atkbd libps2 ahci libahci libata ehci_pci
> xhci_hcd ehci_hcd scsi_mod usbcore usb_common i8042 serio
> kernel: CPU: 4 PID: 3015 Comm: mysqld Tainted: P           O 3.14.1-1-js #1
> kernel: Hardware name: ASUSTeK COMPUTER INC. RS100-E8-PI2/P9D-M
> Series, BIOS 0302 05/10/2013
> kernel: task: ffff88003f046220 ti: ffff88011d3d2000 task.ti: ffff88011d3d2000
> kernel: RIP: 0010:[<ffffffffa0357203>]  [<ffffffffa0357203>]
> rbd_img_request_fill+0x123/0x6d0 [rbd]
> kernel: RSP: 0018:ffff88011d3d3ac0  EFLAGS: 00010286
> kernel: RAX: ffff87ff3fbcdc00 RBX: 0000000008814000 RCX: 00000000011bcf84
> kernel: RDX: ffffffffa035c867 RSI: 0000000000000065 RDI: ffff8800b338f000
> kernel: RBP: ffff88011d3d3b78 R08: 000000000001abe0 R09: ffffffffa03571e0
> kernel: R10: 772d736a2f73656e R11: 6e61682d637a762f R12: ffff8800b338f000
> kernel: R13: ffff88025609d100 R14: 0000000000000000 R15: 0000000000000001
> kernel: FS:  00007fffe17fb700(0000) GS:ffff88042fd00000(0000)
> knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: ffff87ff3fbcdc58 CR3: 0000000126e0e000 CR4: 00000000001407e0
> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> kernel: Stack:
> kernel:  ffff880128ad0d98 0000000000000000 000022011d3d3bb8 ffff87ff3fbcdc20
> kernel:  ffff87ff3fbcdcc8 ffff8803b6459c90 682d637a762fea80 0000000000000001
> kernel:  0000000000000000 ffff87ff3fbcdc00 ffff8803b6459c30 0000000000004000
> kernel: Call Trace:
> kernel:  [<ffffffffa03554d5>] ? rbd_img_request_create+0x155/0x220 [rbd]
> kernel:  [<ffffffff8125cab9>] ? blk_add_timer+0x19/0x20
> kernel:  [<ffffffffa035aa1d>] rbd_request_fn+0x1ed/0x330 [rbd]
> kernel:  [<ffffffff81252f13>] __blk_run_queue+0x33/0x40
> kernel:  [<ffffffff8127a4dd>] cfq_insert_request+0x34d/0x560
> kernel:  [<ffffffff8124fa1c>] __elv_add_request+0x1bc/0x300
> kernel:  [<ffffffff81256cd0>] blk_flush_plug_list+0x1d0/0x230
> kernel:  [<ffffffff812570a4>] blk_finish_plug+0x14/0x40
> kernel:  [<ffffffffa027fd6e>] ext4_writepages+0x48e/0xd50 [ext4]
> kernel:  [<ffffffff811417ae>] do_writepages+0x1e/0x40
> kernel:  [<ffffffff811363d9>] __filemap_fdatawrite_range+0x59/0x60
> kernel:  [<ffffffff811364da>] filemap_write_and_wait_range+0x2a/0x70
> kernel:  [<ffffffffa027749a>] ext4_sync_file+0xba/0x360 [ext4]
> kernel:  [<ffffffff811d50ce>] do_fsync+0x4e/0x80
> kernel:  [<ffffffff811d5350>] SyS_fsync+0x10/0x20
> kernel:  [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b
> kernel: Code: 00 00 00 e8 a0 25 e3 e0 48 85 c0 49 89 c4 0f 84 0c 04 00
> 00 48 8b 45 90 48 8b 5d b0 48 c7 c2 67 c8 35 a0 be 65 00 00 00 4c 89
> e7 <0f> b6 48 58 48 d3 eb 83 78 18 02 48 89 c1 48 8b 49 50 48 c7 c0
> kernel: RIP  [<ffffffffa0357203>] rbd_img_request_fill+0x123/0x6d0 [rbd]
> kernel:  RSP <ffff88011d3d3ac0>
> kernel: CR2: ffff87ff3fbcdc58
> kernel: ---[ end trace bebc1d7ea3182129 ]---
> 
> uname: Linux localhost 3.14.1-1-js #1 SMP PREEMPT Tue Apr 15 17:59:05
> CEST 2014 x86_64 GNU/Linux
> 
> This is a "stock" Arch 3.14.1 kernel with no custom patches.
> 
> For some reason the rest of the system still works fine but trying to
> clean up with SIGKILL makes the system full of unkillable deferred
> zombie processes.
> 
> Ceph cluster looks fine, I ran a successful deep scrub as well. It
> still uses the same machine but it runs a new cluster now:
> 
>     cluster 32c6af82-73ff-4ea8-9220-cd47c6976ecb
>      health HEALTH_WARN
>      monmap e1: 1 mons at {margarina=192.168.0.215:6789/0}, election
> epoch 1, quorum 0 margarina
>      osdmap e54: 2 osds: 2 up, 2 in
>       pgmap v62043: 492 pgs, 6 pools, 4240 MB data, 1182 objects
>             18810 MB used, 7083 GB / 7101 GB avail
>                  492 active+clean
> 
> 2014-05-11 00:03:00.551688 mon.0 [INF] pgmap v62043: 492 pgs: 492
> active+clean; 4240 MB data, 18810 MB used, 7083 GB / 7101 GB avail
> 
> Trying to unmap the related rbd volume goes horribly wrong. "rbd
> unmap" waits for a child process (wait4) with an empty cmdline that
> has deadlocked with the following stack:
> 
> [<ffffffff811e83b3>] fsnotify_clear_marks_by_group_flags+0x33/0xb0
> [<ffffffff811e8443>] fsnotify_clear_marks_by_group+0x13/0x20
> [<ffffffff811e75c2>] fsnotify_destroy_group+0x12/0x50
> [<ffffffff811e96a2>] inotify_release+0x22/0x50
> [<ffffffff811a811c>] __fput+0x9c/0x220
> [<ffffffff811a82ee>] ____fput+0xe/0x10
> [<ffffffff810848ec>] task_work_run+0xbc/0xe0
> [<ffffffff81067556>] do_exit+0x2a6/0xa70
> [<ffffffff814df85b>] oops_end+0x9b/0xe0
> [<ffffffff814d5f8a>] no_context+0x296/0x2a3
> [<ffffffff814d601d>] __bad_area_nosemaphore+0x86/0x1dc
> [<ffffffff814d6186>] bad_area_nosemaphore+0x13/0x15
> [<ffffffff814e1e4e>] __do_page_fault+0x3ce/0x5a0
> [<ffffffff814e2042>] do_page_fault+0x22/0x30
> [<ffffffff814ded38>] page_fault+0x28/0x30
> [<ffffffff811ea249>] SyS_inotify_add_watch+0x219/0x360
> [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> As before rbd likely still doesn't contain any debug symbols as we
> haven't recompiled anything yet. I should really get that done. I
> could double check though if that would really, really help you.
> 
> I will probably hard reboot this machine soon so I can continue my
> stress tests so if you want me to pull out some other data from the
> run time state you should reply immediately.
> 
> Thank you for your time,
> --
> Hannes Landeholm
> Co-founder & CTO
> Jumpstarter - www.jumpstarter.io
> 
> ☎ +46 72 301 35 62
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in rbd_img_request_create
  2014-05-11  3:11 ` Alex Elder
@ 2014-05-11  9:33   ` Ilya Dryomov
  2014-05-12  4:34     ` Alex Elder
  2014-05-11 16:33   ` Hannes Landeholm
  1 sibling, 1 reply; 11+ messages in thread
From: Ilya Dryomov @ 2014-05-11  9:33 UTC (permalink / raw
  To: Alex Elder; +Cc: Hannes Landeholm, Ceph Development, Thorwald Lundqvist

On Sun, May 11, 2014 at 7:11 AM, Alex Elder <elder@ieee.org> wrote:
> On 05/10/2014 05:18 PM, Hannes Landeholm wrote:
>> Hello,
>>
>> I have a development machine that I have been running stress tests on
>> for a week as I'm trying to reproduce some hard to reproduce failures.
>> I've mentioned the same machine previously in the thread "rbd unmap
>> deadlock". I just now noticed that some processes had completely
>> stalled. I looked in the system log and saw this crash about 9 hours
>> ago:
>
> Are you still running kernel rbd as a client of ceph
> services running on the same physical machine?
>
> I personally believe that scenario may be at risk of
> deadlock in any case--we haven't taken great care to
> avoid it in this case.
>
> Anyway...
>
> I can build v3.14.1 but I don't know what kernel configuration
> you are using.  Knowing that could be helpful.  I built it using
> a config I have though, and it's *possible* you crashed on
> this line, in rbd_segment_name():
>         ret = snprintf(name, CEPH_MAX_OID_NAME_LEN + 1, name_format,
>                         rbd_dev->header.object_prefix, segment);
> And if so, the only reason I can think that this failed is if
> rbd_dev->header.object_prefix were null (or an otherwise bad
> pointer value).  But at this point it's a lot of speculation.

More precisely, it crashed on

segment = offset >> rbd_dev->header.obj_order;

while loading obj_order.  rbd_dev is ffff87ff3fbcdc00, which suggests
a use after free of some sort.  (This is the first rbd_dev deref after
grabbing it from img_request at the top of rbd_img_request_fill(),
which got it from request_queue::queuedata in rbd_request_fn().)

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in rbd_img_request_create
  2014-05-11  3:11 ` Alex Elder
  2014-05-11  9:33   ` Ilya Dryomov
@ 2014-05-11 16:33   ` Hannes Landeholm
  1 sibling, 0 replies; 11+ messages in thread
From: Hannes Landeholm @ 2014-05-11 16:33 UTC (permalink / raw
  To: Alex Elder, Ceph Development, Ilya Dryomov; +Cc: Thorwald Lundqvist

> Are you still running kernel rbd as a client of ceph
> services running on the same physical machine?
>
> I personally believe that scenario may be at risk of
> deadlock in any case--we haven't taken great care to
> avoid it in this case.

Yes. Risking a deadlock on this machine is fine though, we only use it
for development and testing.

> Anyway...
>
> I can build v3.14.1 but I don't know what kernel configuration
> you are using.  Knowing that could be helpful.  I built it using
> a config I have though, and it's *possible* you crashed on
> this line, in rbd_segment_name():
>         ret = snprintf(name, CEPH_MAX_OID_NAME_LEN + 1, name_format,
>                         rbd_dev->header.object_prefix, segment);
> And if so, the only reason I can think that this failed is if
> rbd_dev->header.object_prefix were null (or an otherwise bad
> pointer value).  But at this point it's a lot of speculation.

config: http://pastebin.com/unZCzXZZ

> Depending on what your stress tests were doing, I suppose it
> could be that you unmapped an in-use rbd image and there was
> some sort of insufficient locking.
>
> Can you also give a little insight about what your stress
> tests were doing?

The stress testing had about 3 rbd volumes constantly mapped. A
standard webstack was installed on them (LNMP) with a wordpress
installation which was hammered with requests to PHP which made
further calls to mysql. All volumes used ext4 and one of them hosted
the raw mysql innodb data files. From the stack trace it looks like
mysqld did an fsync which cased the failure in rbd. The server was
otherwise completely unused, no concurrent rbd mapping took place. rbd
was using layered mode but there should be a maximum of about 3
layers.

Thank you for your time,
Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in rbd_img_request_create
  2014-05-11  9:33   ` Ilya Dryomov
@ 2014-05-12  4:34     ` Alex Elder
  2014-05-12 17:28       ` Hannes Landeholm
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Elder @ 2014-05-12  4:34 UTC (permalink / raw
  To: Ilya Dryomov; +Cc: Hannes Landeholm, Ceph Development, Thorwald Lundqvist

On 05/11/2014 04:33 AM, Ilya Dryomov wrote:
> On Sun, May 11, 2014 at 7:11 AM, Alex Elder <elder@ieee.org> wrote:
>> On 05/10/2014 05:18 PM, Hannes Landeholm wrote:
>>> Hello,
>>>
>>> I have a development machine that I have been running stress tests on
>>> for a week as I'm trying to reproduce some hard to reproduce failures.
>>> I've mentioned the same machine previously in the thread "rbd unmap
>>> deadlock". I just now noticed that some processes had completely
>>> stalled. I looked in the system log and saw this crash about 9 hours
>>> ago:
>>
>> Are you still running kernel rbd as a client of ceph
>> services running on the same physical machine?
>>
>> I personally believe that scenario may be at risk of
>> deadlock in any case--we haven't taken great care to
>> avoid it in this case.
>>
>> Anyway...
>>
>> I can build v3.14.1 but I don't know what kernel configuration
>> you are using.  Knowing that could be helpful.  I built it using
>> a config I have though, and it's *possible* you crashed on
>> this line, in rbd_segment_name():
>>         ret = snprintf(name, CEPH_MAX_OID_NAME_LEN + 1, name_format,
>>                         rbd_dev->header.object_prefix, segment);
>> And if so, the only reason I can think that this failed is if
>> rbd_dev->header.object_prefix were null (or an otherwise bad
>> pointer value).  But at this point it's a lot of speculation.
> 
> More precisely, it crashed on
> 
> segment = offset >> rbd_dev->header.obj_order;

After looking more closely at this tonight I can say I concur.

kernel: BUG: unable to handle kernel paging request at ffff87ff3fbcdc58
RAX: ffff87ff3fbcdc00

    2483:       00 00 00 be             movzbl 0x58(%rax),%ecx

Unfortunately that's about all I can say right now.

Since the stack includes rbd_request_fn() we know it's a
request that came from the block layer--which means that
the rbd_img_request_create() call was not being done for
a parent image request.  On the other hand, if you're right
about use-after-free, it could still involve an image request
created through that path through the code (if a parent image
request were freed while it was still in use).

Hannes indicated layered images were involved.

More later...

					-Alex

> while loading obj_order.  rbd_dev is ffff87ff3fbcdc00, which suggests
> a use after free of some sort.  (This is the first rbd_dev deref after
> grabbing it from img_request at the top of rbd_img_request_fill(),
> which got it from request_queue::queuedata in rbd_request_fn().)
> 
> Thanks,
> 
>                 Ilya
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in rbd_img_request_create
  2014-05-12  4:34     ` Alex Elder
@ 2014-05-12 17:28       ` Hannes Landeholm
  2014-05-13 12:35         ` Alex Elder
  0 siblings, 1 reply; 11+ messages in thread
From: Hannes Landeholm @ 2014-05-12 17:28 UTC (permalink / raw
  To: Alex Elder, Ceph Development, Ilya Dryomov; +Cc: Thorwald Lundqvist

FYI, we just saw two other kernel paging failures unrelated to rbd, so
rbd might have been the victim and not the culprit:

May 12 17:43:20 localhost kernel: BUG: unable to handle kernel paging
request at ffffffff81666480
May 12 17:43:20 localhost kernel: IP: [<ffffffff810b0b2b>] mspin_lock+0x2b/0x40
May 12 17:43:20 localhost kernel: PGD 180f067 PUD 1810063 PMD 80000000016001e1
May 12 17:43:20 localhost kernel: Oops: 0003 [#1] PREEMPT SMP
May 12 17:43:20 localhost kernel: Modules linked in: xt_recent
xt_conntrack ipt_REJECT xt_limit xt_tcpudp iptable_filter veth
ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables cbc bridge stp llc
zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) co
retemp x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt
iTCO_vendor_support evdev mac_hid crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel aesni_intel ast aes_x86_64 lrw
gf128mul glue_helper ttm ablk_helper cryptd drm_kms_helper igb
microcode drm psmouse ptp pps_core hwmon serio_raw dca pcspk
r syscopyarea i2c_i801 sysfillrect sysimgblt i2c_algo_bit i2c_core
lpc_ich fan thermal ipmi_si battery ipmi_msghandler video mei_me
shpchp mei tpm_infineon tpm_tis tpm button processor rbd libceph
May 12 17:43:20 localhost kernel:  crc32c libcrc32c ext4 crc16 mbcache
jbd2 sd_mod sr_mod crc_t10dif cdrom crct10dif_common atkbd libps2 ahci
libahci libata ehci_pci xhci_hcd ehci_hcd scsi_mod usbcore usb_common
i8042 serio
May 12 17:43:20 localhost kernel: CPU: 1 PID: 22265 Comm: proc1
Tainted: P           O 3.14.1-1-js #1
May 12 17:43:20 localhost kernel: Hardware name: ASUSTeK COMPUTER INC.
RS100-E8-PI2/P9D-M Series, BIOS 0302 05/10/2013
May 12 17:43:20 localhost kernel: task: ffff88007a5909d0 ti:
ffff8802ba42c000 task.ti: ffff8802ba42c000
May 12 17:43:20 localhost kernel: RIP: 0010:[<ffffffff810b0b2b>]
[<ffffffff810b0b2b>] mspin_lock+0x2b/0x40
May 12 17:43:20 localhost kernel: RSP: 0018:ffff8802ba42de00  EFLAGS: 00010282
May 12 17:43:20 localhost kernel: RAX: ffffffff81666480 RBX:
ffff8802c8a9bc08 RCX: 00000000ffffffff
May 12 17:43:20 localhost kernel: RDX: 0000000000000000 RSI:
ffff8802ba42de10 RDI: ffff8802c8a9bc28
May 12 17:43:20 localhost kernel: RBP: ffff8802ba42de00 R08:
0000000000000000 R09: 0000000000000000
May 12 17:43:20 localhost kernel: R10: 0000000000000002 R11:
0000000000000400 R12: 000000008189ad60
May 12 17:43:20 localhost kernel: R13: ffff8802c8a9bc28 R14:
ffff88007a5909d0 R15: ffff8802ba42dfd8
May 12 17:43:20 localhost kernel: FS:  00007f86df906df0(0000)
GS:ffff88042fc40000(0000) knlGS:0000000000000000
May 12 17:43:20 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
May 12 17:43:20 localhost kernel: CR2: ffffffff81666480 CR3:
00000002e3378000 CR4: 00000000001407e0
May 12 17:43:20 localhost kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
May 12 17:43:20 localhost kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400
May 12 17:43:20 localhost kernel: Stack:
May 12 17:43:20 localhost kernel:  ffff8802ba42de58 ffffffff814dce6d
0000000000000000 ffff880000000000
May 12 17:43:20 localhost kernel:  ffff880419518660 ffff8802ba42de40
ffff8802c8a9bc08 ffff8802c8a9bc08
May 12 17:43:20 localhost kernel:  ffff8802c8a9bc00 ffff88008d0589d8
ffff880419518660 ffff8802ba42de70
May 12 17:43:20 localhost kernel: Call Trace:
May 12 17:43:20 localhost kernel:  [<ffffffff814dce6d>]
__mutex_lock_slowpath+0x6d/0x1f0
May 12 17:43:20 localhost kernel:  [<ffffffff814dd007>] mutex_lock+0x17/0x27
May 12 17:43:20 localhost kernel:  [<ffffffff811ed170>]
eventpoll_release_file+0x50/0xa0
May 12 17:43:20 localhost kernel:  [<ffffffff811a8273>] __fput+0x1f3/0x220
May 12 17:43:20 localhost kernel:  [<ffffffff811a82ee>] ____fput+0xe/0x10
May 12 17:43:20 localhost kernel:  [<ffffffff810848cf>] task_work_run+0x9f/0xe0
May 12 17:43:20 localhost kernel:  [<ffffffff81015adc>]
do_notify_resume+0x8c/0xa0
May 12 17:43:20 localhost kernel:  [<ffffffff814e6920>] int_signal+0x12/0x17
May 12 17:43:20 localhost kernel: Code: 0f 1f 44 00 00 55 c7 46 08 00
00 00 00 48 89 f0 48 c7 06 00 00 00 00 48 89 e5 48 87 07 48 85 c0 75
09 c7 46 08 01 00 00 00 5d c3 <48> 89 30 8b 46 08 85 c0 75 f4 f3 90 8b
46 08 85 c0 74 f7 5d c3
May 12 17:43:20 localhost kernel: RIP  [<ffffffff810b0b2b>] mspin_lock+0x2b/0x40
May 12 17:43:20 localhost kernel:  RSP <ffff8802ba42de00>
May 12 17:43:20 localhost kernel: CR2: ffffffff81666480
May 12 17:43:20 localhost kernel: ---[ end trace 60b4ebe6d1932f8a ]---
May 12 17:43:20 localhost kernel: note: proc1[22265] exited with preempt_count 1

----

May 12 17:43:50 localhost kernel: kernel tried to execute NX-protected
page - exploit attempt? (uid: 0)
May 12 17:43:50 localhost kernel: BUG: unable to handle kernel paging
request at ffff880419518660
May 12 17:43:50 localhost kernel: IP: [<ffff880419518660>] 0xffff880419518660
May 12 17:43:50 localhost kernel: PGD 1b28067 PUD 1b2b067 PMD 80000004194001e3
May 12 17:43:50 localhost kernel: Oops: 0011 [#2] PREEMPT SMP
May 12 17:43:50 localhost kernel: Modules linked in: xt_recent
xt_conntrack ipt_REJECT xt_limit xt_tcpudp iptable_filter veth
ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables cbc bridge stp llc
zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) coretemp
x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt
iTCO_vendor_support evdev mac_hid crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel aesni_intel ast aes_x86_64 lrw
gf128mul glue_helper ttm ablk_helper cryptd drm_kms_helper igb
microcode drm psmouse ptp pps_core hwmon serio_raw dca pcspkr
syscopyarea i2c_i801 sysfillrect sysimgblt i2c_algo_bit i2c_core
lpc_ich fan thermal ipmi_si battery ipmi_msghandler video mei_me
shpchp mei tpm_infineon tpm_tis tpm button processor rbd libceph
May 12 17:43:50 localhost kernel:  crc32c libcrc32c ext4 crc16 mbcache
jbd2 sd_mod sr_mod crc_t10dif cdrom crct10dif_common atkbd libps2 ahci
libahci libata ehci_pci xhci_hcd ehci_hcd scsi_mod usbcore usb_common
i8042 serio
May 12 17:43:50 localhost kernel: CPU: 3 PID: 22285 Comm: proc2
Tainted: P      D    O 3.14.1-1-js #1
May 12 17:43:50 localhost kernel: Hardware name: ASUSTeK COMPUTER INC.
RS100-E8-PI2/P9D-M Series, BIOS 0302 05/10/2013
May 12 17:43:50 localhost kernel: task: ffff8803070c4e80 ti:
ffff8802ba4d2000 task.ti: ffff8802ba4d2000
May 12 17:43:50 localhost kernel: RIP: 0010:[<ffff880419518660>]
[<ffff880419518660>] 0xffff880419518660
May 12 17:43:50 localhost kernel: RSP: 0018:ffff8802ba4d3c78  EFLAGS: 00010246
May 12 17:43:50 localhost kernel: RAX: ffff8802ba42de10 RBX:
ffff8802c8a9bc00 RCX: 0000000000000246
May 12 17:43:50 localhost kernel: RDX: ffff8802ba4d3cb8 RSI:
ffff8802c8a9bc00 RDI: ffff88015075b200
May 12 17:43:50 localhost kernel: RBP: ffff8802ba4d3ca8 R08:
ffff8803070c4e80 R09: ffff88011e9fc018
May 12 17:43:50 localhost kernel: R10: 00000000ffffffff R11:
0000000000000202 R12: ffff88015075b200
May 12 17:43:50 localhost kernel: R13: ffff8802ba4d3cb8 R14:
0000000000000000 R15: ffff88015b6d8700
May 12 17:43:50 localhost kernel: FS:  00007fff346793e0(0000)
GS:ffff88042fcc0000(0000) knlGS:0000000000000000
May 12 17:43:50 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
May 12 17:43:50 localhost kernel: CR2: ffff880419518660 CR3:
000000031150c000 CR4: 00000000001407e0
May 12 17:43:50 localhost kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
May 12 17:43:50 localhost kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400
May 12 17:43:50 localhost kernel: Stack:
May 12 17:43:50 localhost kernel:  ffffffff813d03a0 ffff88011e9fc000
ffff88011e9fc018 ffff8802ba4d3cf0
May 12 17:43:50 localhost kernel:  ffff8802ba4d3d08 ffff8802c26316c0
ffff8802ba4d3ce8 ffffffff811ec07d
May 12 17:43:50 localhost kernel:  0000000000000000 0000000080002018
ffffffff811ebff0 ffff8802ba4d3d08
May 12 17:43:50 localhost kernel: Call Trace:
May 12 17:43:50 localhost kernel:  [<ffffffff813d03a0>] ? sock_poll+0x110/0x120
May 12 17:43:50 localhost kernel:  [<ffffffff811ec07d>]
ep_read_events_proc+0x8d/0xc0
May 12 17:43:50 localhost kernel:  [<ffffffff811ebff0>] ?
ep_show_fdinfo+0xa0/0xa0
May 12 17:43:50 localhost kernel:  [<ffffffff811ec80a>]
ep_scan_ready_list.isra.12+0x8a/0x1c0
May 12 17:43:50 localhost kernel:  [<ffffffff811ec940>] ?
ep_scan_ready_list.isra.12+0x1c0/0x1c0
May 12 17:43:50 localhost kernel:  [<ffffffff811ec95e>]
ep_poll_readyevents_proc+0x1e/0x20
May 12 17:43:50 localhost kernel:  [<ffffffff811ec493>]
ep_call_nested.constprop.13+0xb3/0x110
May 12 17:43:50 localhost kernel:  [<ffffffff811ece83>]
ep_eventpoll_poll+0x63/0xa0
May 12 17:43:50 localhost kernel:  [<ffffffff811ec157>]
ep_send_events_proc+0xa7/0x1c0
May 12 17:43:50 localhost kernel:  [<ffffffff811ec0b0>] ?
ep_read_events_proc+0xc0/0xc0
May 12 17:43:50 localhost kernel:  [<ffffffff811ec80a>]
ep_scan_ready_list.isra.12+0x8a/0x1c0
May 12 17:43:50 localhost kernel:  [<ffffffff811eca73>] ep_poll+0x113/0x340
May 12 17:43:50 localhost kernel:  [<ffffffff811c239e>] ? __fget+0x6e/0xb0
May 12 17:43:50 localhost kernel:  [<ffffffff811ee015>] SyS_epoll_wait+0xb5/0xe0
May 12 17:43:50 localhost kernel:  [<ffffffff814e66e9>]
system_call_fastpath+0x16/0x1b
May 12 17:43:50 localhost kernel: Code: 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 86 51 19 04 88
ff ff 80 87 c0 1e 04 88 ff ff <80> 87 c0 1e 04 88 ff ff 00 c8 52 19 04
88 ff ff 00 40 00 00 00
May 12 17:43:50 localhost kernel: RIP  [<ffff880419518660>] 0xffff880419518660
May 12 17:43:50 localhost kernel:  RSP <ffff8802ba4d3c78>
May 12 17:43:50 localhost kernel: CR2: ffff880419518660
May 12 17:43:50 localhost kernel: ---[ end trace 60b4ebe6d1932f8b ]---

The above happened when I was killing some stress-test processes that
used a lot of memory with CTRL+C (SIGINT). In two instances this
caused kernel paging failures in the two unrelated processes above
(used for a different stress test), so something is probably horribly
wrong with the memory manager state in the kernel. Maybe some module
is doing double free or similar causing memory to be shared between
different contexts when it's allocated? Right now my guess would be
that ZFS is the problem as it is known for poorly integrating with the
kernel memory wise. We where experimenting to use ZFS as a backend for
CEPH just to see what the performance and storage saving
characteristics was like but we're gonna completely switch to ext4 now
and see if the problem goes away.

Thank you for your time,
Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in rbd_img_request_create
  2014-05-12 17:28       ` Hannes Landeholm
@ 2014-05-13 12:35         ` Alex Elder
  2014-05-13 17:17           ` Hannes Landeholm
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Elder @ 2014-05-13 12:35 UTC (permalink / raw
  To: Hannes Landeholm, Ceph Development, Ilya Dryomov; +Cc: Thorwald Lundqvist

On 05/12/2014 12:28 PM, Hannes Landeholm wrote:
> We where experimenting to use ZFS as a backend for
> CEPH just to see what the performance and storage saving
> characteristics was like but we're gonna completely switch to ext4 now
> and see if the problem goes away.

This is very good to know. Please report back what you find.
Thanks.

					-Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in rbd_img_request_create
  2014-05-13 12:35         ` Alex Elder
@ 2014-05-13 17:17           ` Hannes Landeholm
  2014-05-13 17:18             ` Alex Elder
  2014-05-13 20:58             ` Sage Weil
  0 siblings, 2 replies; 11+ messages in thread
From: Hannes Landeholm @ 2014-05-13 17:17 UTC (permalink / raw
  To: Alex Elder, Ilya Dryomov, Ceph Development; +Cc: Thorwald Lundqvist

On Tue, May 13, 2014 at 2:35 PM, Alex Elder <elder@ieee.org> wrote:
> On 05/12/2014 12:28 PM, Hannes Landeholm wrote:
>> We where experimenting to use ZFS as a backend for
>> CEPH just to see what the performance and storage saving
>> characteristics was like but we're gonna completely switch to ext4 now
>> and see if the problem goes away.
>
> This is very good to know. Please report back what you find.

We completely cleansed the server in question from any trace of ZFS
and ran the same test again with ext4 instead for the ceph backend
with the same conditions. Everything worked fine. The machine has also
stopped randomly locking up. We will not touch ZFS again. Maybe in a
distant future where it actually integrates well with Linux.

Thank you for your time,
Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in rbd_img_request_create
  2014-05-13 17:17           ` Hannes Landeholm
@ 2014-05-13 17:18             ` Alex Elder
  2014-05-13 20:58             ` Sage Weil
  1 sibling, 0 replies; 11+ messages in thread
From: Alex Elder @ 2014-05-13 17:18 UTC (permalink / raw
  To: Hannes Landeholm, Ilya Dryomov, Ceph Development; +Cc: Thorwald Lundqvist

On 05/13/2014 12:17 PM, Hannes Landeholm wrote:
> On Tue, May 13, 2014 at 2:35 PM, Alex Elder <elder@ieee.org> wrote:
>> On 05/12/2014 12:28 PM, Hannes Landeholm wrote:
>>> We where experimenting to use ZFS as a backend for
>>> CEPH just to see what the performance and storage saving
>>> characteristics was like but we're gonna completely switch to ext4 now
>>> and see if the problem goes away.
>>
>> This is very good to know. Please report back what you find.
> 
> We completely cleansed the server in question from any trace of ZFS
> and ran the same test again with ext4 instead for the ceph backend
> with the same conditions. Everything worked fine. The machine has also
> stopped randomly locking up. We will not touch ZFS again. Maybe in a
> distant future where it actually integrates well with Linux.

That's good news for rbd...  Thanks a lot.	-Alex

> Thank you for your time,
> Hannes
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in rbd_img_request_create
  2014-05-13 17:17           ` Hannes Landeholm
  2014-05-13 17:18             ` Alex Elder
@ 2014-05-13 20:58             ` Sage Weil
  2014-05-13 21:39               ` Hannes Landeholm
  1 sibling, 1 reply; 11+ messages in thread
From: Sage Weil @ 2014-05-13 20:58 UTC (permalink / raw
  To: Hannes Landeholm
  Cc: Alex Elder, Ilya Dryomov, Ceph Development, Thorwald Lundqvist

Hi Hannes,

On Tue, 13 May 2014, Hannes Landeholm wrote:
> On Tue, May 13, 2014 at 2:35 PM, Alex Elder <elder@ieee.org> wrote:
> > On 05/12/2014 12:28 PM, Hannes Landeholm wrote:
> >> We where experimenting to use ZFS as a backend for
> >> CEPH just to see what the performance and storage saving
> >> characteristics was like but we're gonna completely switch to ext4 now
> >> and see if the problem goes away.
> >
> > This is very good to know. Please report back what you find.
> 
> We completely cleansed the server in question from any trace of ZFS
> and ran the same test again with ext4 instead for the ceph backend
> with the same conditions. Everything worked fine. The machine has also
> stopped randomly locking up. We will not touch ZFS again. Maybe in a
> distant future where it actually integrates well with Linux.

Note that there was a patch a week or so ago fixing a problem in RBD when 
running ZFS on top of RBD (IIRC it had to do with length 0 bios or 
something?).  Was this ZFS on RBD or ZFS underneat the ceph-osd daemons?

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in rbd_img_request_create
  2014-05-13 20:58             ` Sage Weil
@ 2014-05-13 21:39               ` Hannes Landeholm
  0 siblings, 0 replies; 11+ messages in thread
From: Hannes Landeholm @ 2014-05-13 21:39 UTC (permalink / raw
  To: Sage Weil; +Cc: Alex Elder, Ilya Dryomov, Ceph Development, Thorwald Lundqvist

On Tue, May 13, 2014 at 10:58 PM, Sage Weil <sage@inktank.com> wrote:
>
> Note that there was a patch a week or so ago fixing a problem in RBD when
> running ZFS on top of RBD (IIRC it had to do with length 0 bios or
> something?).  Was this ZFS on RBD or ZFS underneat the ceph-osd daemons?
>

Sorry if I was unclear on this. ZFS was running underneath the
ceph-osds. Today we switched to ext4 instead.

More specifically we where interested in the compression features of
ZFS since people claim it can both increase throughput, lower latency
and lower disk usage with no significant drawbacks. This is because
LZO (and other algos of the same class, like google snappy)
immediately aborts if the data doesn't look trivially compressible
which make it comparable in speed to memcpy. It would be interesting
to look at experimentally supporting this in the osds themselves. It
could have theoretical huge benefits when storing disk images and
large binary files which tend to contain a lot of uninitialized
regions full of zeroes.

Let's take a super biased example, a really fresh mysql innodb
database with wordpress installed:

> $ent ibdata1
> Entropy = 0.950773 bits per byte.
> Optimum compression would reduce the size
> of this 18874368 byte file by 88 percent.
> Chi square distribution for 18874368 samples is 3847986113.90, and randomly
> would exceed this value 0.01 percent of the times.
> Arithmetic mean value of data bytes is 16.0581 (127.5 = random).
> Monte Carlo value for Pi is 3.832257589 (error 21.98 percent).
> Serial correlation coefficient is 0.972634 (totally uncorrelated = 0.0).
---------

As you can see, according to ent this reference innodb data file is
very compressable. Standard LZO on a tmpfs:

> $time lzop ibdata1
> 0.01user 0.00system 0:00.04elapsed 35%CPU (0avgtext+0avgdata 1160maxresident)k
> 0inputs+0outputs (0major+372minor)pagefaults 0swaps
> $ls -l
> 18874368 May 13 23:20 ibdata1
> 536126 May 13 23:20 ibdata1.lzo

The time to compress is even hard to measure for 19 MB of data because
it's so fast the signal disappears in the noise.

This reply got a bit long but I wanted to share some thoughts I had.

Thank you for your time,
Hannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-05-13 21:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-10 22:18 crash in rbd_img_request_create Hannes Landeholm
2014-05-11  3:11 ` Alex Elder
2014-05-11  9:33   ` Ilya Dryomov
2014-05-12  4:34     ` Alex Elder
2014-05-12 17:28       ` Hannes Landeholm
2014-05-13 12:35         ` Alex Elder
2014-05-13 17:17           ` Hannes Landeholm
2014-05-13 17:18             ` Alex Elder
2014-05-13 20:58             ` Sage Weil
2014-05-13 21:39               ` Hannes Landeholm
2014-05-11 16:33   ` Hannes Landeholm

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.