ext2/zram issue [was: Linux 5.19]

Linux-ext4 Archive mirror
 help / color / mirror / Atom feed

* ext2/zram issue [was: Linux 5.19]
       [not found] <CAHk-=wgrz5BBk=rCz7W28Fj_o02s0Xi0OEQ3H1uQgOdFvHgx0w@mail.gmail.com>
@ 2022-08-09  6:03 ` Jiri Slaby
  2022-08-09  7:59   ` Jiri Slaby
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Jiri Slaby @ 2022-08-09  6:03 UTC (permalink / raw
  To: Linus Torvalds, Linux Kernel Mailing List
  Cc: minchan, ngupta, Sergey Senozhatsky, Jan Kara, Ted Ts'o,
	Andreas Dilger, Ext4 Developers List

Hi,

On 31. 07. 22, 23:43, Linus Torvalds wrote:
> So here we are, one week late, and 5.19 is tagged and pushed out.
> 
> The full shortlog (just from rc8, obviously not all of 5.19) is below,
> but I can happily report that there is nothing really interesting in
> there. A lot of random small stuff.

Note: I originally reported this downstream for tracking at:
https://bugzilla.suse.com/show_bug.cgi?id=1202203

5.19 behaves pretty weird in openSUSE's openQA (opposing to 5.18, or 
5.18.15). It's all qemu-kvm "HW"¹⁾:
https://openqa.opensuse.org/tests/2502148
loop2: detected capacity change from 0 to 72264
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing 
to inode 57375 starting block 137216)
Buffer I/O error on device zram0, logical block 137216
Buffer I/O error on device zram0, logical block 137217
...
SQUASHFS error: xz decompression failed, data probably corrupt
SQUASHFS error: Failed to read block 0x2e41680: -5
SQUASHFS error: xz decompression failed, data probably corrupt
SQUASHFS error: Failed to read block 0x2e41680: -5
Bus error



https://openqa.opensuse.org/tests/2502145
FS-Cache: Loaded
begin 644 ldconfig.core.pid_2094.sig_7.time_1659859442



https://openqa.opensuse.org/tests/2502146
FS-Cache: Loaded
begin 644 Xorg.bin.core.pid_3733.sig_6.time_1659858784



https://openqa.opensuse.org/tests/2502148
EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing 
to inode 57375 starting block 137216)
Buffer I/O error on device zram0, logical block 137216
Buffer I/O error on device zram0, logical block 137217



https://openqa.opensuse.org/tests/2502154
[   13.158090][  T634] FS-Cache: Loaded
...
[  525.627024][    C0] sysrq: Show State



Those are various failures -- crashes of ldconfig, Xorg; I/O failures on 
zram; the last one is a lockup likely, something invoked sysrq after 
500s stall.

Interestingly, I've also hit this twice locally:
 > init[1]: segfault at 18 ip 00007fb6154b4c81 sp 00007ffc243ed600 error 
6 in libc.so.6[7fb61543f000+185000]
 > Code: 41 5f c3 66 0f 1f 44 00 00 42 f6 44 10 08 01 0f 84 04 01 00 00 
48 83 e1 fe 48 89 48 08 49 8b 47 70 49 89 5f 70 66 48 0f 6e c0 <48> 89 
58 18 0f 16 44 24 08 48 81 fd ff 03 00 00 76 08 66 0f ef c9
 > ***  signal 11 ***
 > malloc(): unsorted double linked list corrupted
 > traps: init[1] general protection fault ip:7fb61543f8b9 
sp:7ffc243ebf40 error:0 in libc.so.6[7fb61543f000+185000]
 > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
 > CPU: 0 PID: 1 Comm: init Not tainted 5.19.0-1-default #1 openSUSE 
Tumbleweed e1df13166a33f423514290c702e43cfbb2b5b575

KASAN is not helpful either, so it's unlikely a memory corruption 
(unless it is "HW" related; should I try to turn on IOMMU in qemu?):
> kasan: KernelAddressSanitizer initialized
> ...
> zram: module verification failed: signature and/or required key missing - tainting kernel
> zram: Added device: zram0
> zram0: detected capacity change from 0 to 2097152
> EXT4-fs (zram0): mounting ext2 file system using the ext4 subsystem
> EXT4-fs (zram0): mounted filesystem without journal. Quota mode: none.
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
> Buffer I/O error on device zram0, logical block 159744
> Buffer I/O error on device zram0, logical block 159745



They all occur to me like a zram failure. The installer apparently 
creates an ext2 FS and after it mounts it using ext4 module, the issue 
starts occurring.

Any tests I/you could run on 5.19 to exercise zram and ext2? Otherwise I 
am unable to reproduce easily, except using the openSUSE installer :/.

Any other ideas? Or is this known already?

¹⁾ main are uefi boot and virtio-blk (it likely happens with virtio-scsi 
too). The cmdline _I_ use: qemu-kvm -device intel-hda -device hda-duplex 
-drive file=/tmp/pokus.qcow2,if=none,id=hd -device 
virtio-blk-pci,drive=hd -drive 
if=pflash,format=raw,unit=0,readonly=on,file=/usr/share/qemu/ovmf-x86_64-opensuse-code.bin 
-drive if=pflash,format=raw,unit=1,file=/tmp/vars.bin -cdrom 
/tmp/cd1.iso  -m 1G -smp 1 -net user -net nic,model=virtio -serial pty 
-device virtio-rng-pci -device qemu-xhci,p2=4,p3=4 -usbdevice tablet


thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09  6:03 ` ext2/zram issue [was: Linux 5.19] Jiri Slaby
@ 2022-08-09  7:59   ` Jiri Slaby
  2022-08-09  8:12     ` Jiri Slaby
  2022-08-09  9:12   ` Lukas Czerner
       [not found]   ` <20220830214626.26544-1-charlie39@cock.li>
  2 siblings, 1 reply; 20+ messages in thread
From: Jiri Slaby @ 2022-08-09  7:59 UTC (permalink / raw
  To: Linus Torvalds, Linux Kernel Mailing List
  Cc: minchan, ngupta, Sergey Senozhatsky, Jan Kara, Ted Ts'o,
	Andreas Dilger, Ext4 Developers List, avromanov, ddrokosov,
	ngupta

On 09. 08. 22, 8:03, Jiri Slaby wrote:
> Hi,
> 
> On 31. 07. 22, 23:43, Linus Torvalds wrote:
>> So here we are, one week late, and 5.19 is tagged and pushed out.
>>
>> The full shortlog (just from rc8, obviously not all of 5.19) is below,
>> but I can happily report that there is nothing really interesting in
>> there. A lot of random small stuff.
> 
> Note: I originally reported this downstream for tracking at:
> https://bugzilla.suse.com/show_bug.cgi?id=1202203
> 
> 5.19 behaves pretty weird in openSUSE's openQA (opposing to 5.18, or 
> 5.18.15). It's all qemu-kvm "HW"¹⁾:
> https://openqa.opensuse.org/tests/2502148
> loop2: detected capacity change from 0 to 72264
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing 
> to inode 57375 starting block 137216)
> Buffer I/O error on device zram0, logical block 137216
> Buffer I/O error on device zram0, logical block 137217
> ...
> SQUASHFS error: xz decompression failed, data probably corrupt
> SQUASHFS error: Failed to read block 0x2e41680: -5
> SQUASHFS error: xz decompression failed, data probably corrupt
> SQUASHFS error: Failed to read block 0x2e41680: -5
> Bus error
> 
> 
> 
> https://openqa.opensuse.org/tests/2502145
> FS-Cache: Loaded
> begin 644 ldconfig.core.pid_2094.sig_7.time_1659859442
> 
> 
> 
> https://openqa.opensuse.org/tests/2502146
> FS-Cache: Loaded
> begin 644 Xorg.bin.core.pid_3733.sig_6.time_1659858784
> 
> 
> 
> https://openqa.opensuse.org/tests/2502148
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing 
> to inode 57375 starting block 137216)
> Buffer I/O error on device zram0, logical block 137216
> Buffer I/O error on device zram0, logical block 137217
> 
> 
> 
> https://openqa.opensuse.org/tests/2502154
> [   13.158090][  T634] FS-Cache: Loaded
> ...
> [  525.627024][    C0] sysrq: Show State
> 
> 
> 
> Those are various failures -- crashes of ldconfig, Xorg; I/O failures on 
> zram; the last one is a lockup likely, something invoked sysrq after 
> 500s stall.
> 
> Interestingly, I've also hit this twice locally:
>  > init[1]: segfault at 18 ip 00007fb6154b4c81 sp 00007ffc243ed600 error 
> 6 in libc.so.6[7fb61543f000+185000]
>  > Code: 41 5f c3 66 0f 1f 44 00 00 42 f6 44 10 08 01 0f 84 04 01 00 00 
> 48 83 e1 fe 48 89 48 08 49 8b 47 70 49 89 5f 70 66 48 0f 6e c0 <48> 89 
> 58 18 0f 16 44 24 08 48 81 fd ff 03 00 00 76 08 66 0f ef c9
>  > ***  signal 11 ***
>  > malloc(): unsorted double linked list corrupted
>  > traps: init[1] general protection fault ip:7fb61543f8b9 
> sp:7ffc243ebf40 error:0 in libc.so.6[7fb61543f000+185000]
>  > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>  > CPU: 0 PID: 1 Comm: init Not tainted 5.19.0-1-default #1 openSUSE 
> Tumbleweed e1df13166a33f423514290c702e43cfbb2b5b575
> 
> KASAN is not helpful either, so it's unlikely a memory corruption 
> (unless it is "HW" related; should I try to turn on IOMMU in qemu?):
>> kasan: KernelAddressSanitizer initialized
>> ...
>> zram: module verification failed: signature and/or required key 
>> missing - tainting kernel
>> zram: Added device: zram0
>> zram0: detected capacity change from 0 to 2097152
>> EXT4-fs (zram0): mounting ext2 file system using the ext4 subsystem
>> EXT4-fs (zram0): mounted filesystem without journal. Quota mode: none.
>> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing 
>> to inode 16386 starting block 159744)
>> Buffer I/O error on device zram0, logical block 159744
>> Buffer I/O error on device zram0, logical block 159745
> 
> 
> 
> They all occur to me like a zram failure. The installer apparently 
> creates an ext2 FS and after it mounts it using ext4 module, the issue 
> starts occurring.
> 
> Any tests I/you could run on 5.19 to exercise zram and ext2? Otherwise I 
> am unable to reproduce easily, except using the openSUSE installer :/.

Ah, now I can. It's easy when one lowers memory available to qemu. -m 
800M in this case:
echo $((1000*1024*1024)) > /sys/block/zram0/disksize
mkfs.ext2 /dev/zram0
mount /dev/zram0 /mnt/a/
dd if=/dev/urandom of=/mnt/a/stuff
[  200.334277][    T8] EXT4-fs warning (device zram0): ext4_end_bio:343: 
I/O error 10 writing to inode 12 starting block 8192)
[  200.340198][    T8] Buffer I/O error on device zram0, logical block 8192


So currently, I blame:
commit e7be8d1dd983156bbdd22c0319b71119a8fbb697
Author: Alexey Romanov <avromanov@sberdevices.ru>
Date:   Thu May 12 20:23:07 2022 -0700

     zram: remove double compression logic


/me needs to confirm.

> Any other ideas? Or is this known already?
> 
> ¹⁾ main are uefi boot and virtio-blk (it likely happens with virtio-scsi 
> too). The cmdline _I_ use: qemu-kvm -device intel-hda -device hda-duplex 
> -drive file=/tmp/pokus.qcow2,if=none,id=hd -device 
> virtio-blk-pci,drive=hd -drive 
> if=pflash,format=raw,unit=0,readonly=on,file=/usr/share/qemu/ovmf-x86_64-opensuse-code.bin -drive if=pflash,format=raw,unit=1,file=/tmp/vars.bin -cdrom /tmp/cd1.iso  -m 1G -smp 1 -net user -net nic,model=virtio -serial pty -device virtio-rng-pci -device qemu-xhci,p2=4,p3=4 -usbdevice tablet
> 
> 
> thanks,

-- 
js
suse labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09  7:59   ` Jiri Slaby
@ 2022-08-09  8:12     ` Jiri Slaby
  2022-08-09  8:43       ` Sergey Senozhatsky
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Slaby @ 2022-08-09  8:12 UTC (permalink / raw
  To: Linus Torvalds, Linux Kernel Mailing List
  Cc: minchan, ngupta, Sergey Senozhatsky, Jan Kara, Ted Ts'o,
	Andreas Dilger, Ext4 Developers List, avromanov, ddrokosov

On 09. 08. 22, 9:59, Jiri Slaby wrote:
> Ah, now I can. It's easy when one lowers memory available to qemu. -m 
> 800M in this case:
> echo $((1000*1024*1024)) > /sys/block/zram0/disksize
> mkfs.ext2 /dev/zram0
> mount /dev/zram0 /mnt/a/
> dd if=/dev/urandom of=/mnt/a/stuff
> [  200.334277][    T8] EXT4-fs warning (device zram0): ext4_end_bio:343: 
> I/O error 10 writing to inode 12 starting block 8192)
> [  200.340198][    T8] Buffer I/O error on device zram0, logical block 8192
> 
> 
> So currently, I blame:
> commit e7be8d1dd983156bbdd22c0319b71119a8fbb697
> Author: Alexey Romanov <avromanov@sberdevices.ru>
> Date:   Thu May 12 20:23:07 2022 -0700
> 
>      zram: remove double compression logic
> 
> 
> /me needs to confirm.

With that commit reverted, I see no more I/O errors, only oom-killer 
messages (which is OK IMO, provided I write 1G of urandom on a machine 
w/ 800M of RAM):
[   30.424603][  T728] dd invoked oom-killer: 
gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0

Now let me submit it to openQA too...

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09  8:12     ` Jiri Slaby
@ 2022-08-09  8:43       ` Sergey Senozhatsky
  2022-08-09  9:11         ` Sergey Senozhatsky
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Senozhatsky @ 2022-08-09  8:43 UTC (permalink / raw
  To: Jiri Slaby
  Cc: Linus Torvalds, Linux Kernel Mailing List, minchan, ngupta,
	Sergey Senozhatsky, Jan Kara, Ted Ts'o, Andreas Dilger,
	Ext4 Developers List, avromanov, ddrokosov

On (22/08/09 10:12), Jiri Slaby wrote:
> > So currently, I blame:
> > commit e7be8d1dd983156bbdd22c0319b71119a8fbb697
> > Author: Alexey Romanov <avromanov@sberdevices.ru>
> > Date:   Thu May 12 20:23:07 2022 -0700
> > 
> >      zram: remove double compression logic
> > 
> > 
> > /me needs to confirm.
> 
> With that commit reverted, I see no more I/O errors, only oom-killer
> messages (which is OK IMO, provided I write 1G of urandom on a machine w/
> 800M of RAM):

Hmm... So handle allocation always succeeds in the slow path? (when we
try to allocate it second time)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09  8:43       ` Sergey Senozhatsky
@ 2022-08-09  9:11         ` Sergey Senozhatsky
  2022-08-09  9:20           ` Sergey Senozhatsky
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Senozhatsky @ 2022-08-09  9:11 UTC (permalink / raw
  To: Jiri Slaby
  Cc: Linus Torvalds, Linux Kernel Mailing List, minchan, ngupta,
	Jan Kara, Ted Ts'o, Andreas Dilger, Ext4 Developers List,
	avromanov, ddrokosov, Sergey Senozhatsky

On (22/08/09 17:43), Sergey Senozhatsky wrote:
> On (22/08/09 10:12), Jiri Slaby wrote:
> > > So currently, I blame:
> > > commit e7be8d1dd983156bbdd22c0319b71119a8fbb697
> > > Author: Alexey Romanov <avromanov@sberdevices.ru>
> > > Date:   Thu May 12 20:23:07 2022 -0700
> > > 
> > >      zram: remove double compression logic
> > > 
> > > 
> > > /me needs to confirm.
> > 
> > With that commit reverted, I see no more I/O errors, only oom-killer
> > messages (which is OK IMO, provided I write 1G of urandom on a machine w/
> > 800M of RAM):
> 
> Hmm... So handle allocation always succeeds in the slow path? (when we
> try to allocate it second time)

Yeah I can see how handle re-allocation with direct reclaim can make it more
successful, but in exchange it oom-kills some user-space process, I suppose.
Is oom-kill really a good alternative though?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09  6:03 ` ext2/zram issue [was: Linux 5.19] Jiri Slaby
  2022-08-09  7:59   ` Jiri Slaby
@ 2022-08-09  9:12   ` Lukas Czerner
  2022-08-09  9:15     ` Sergey Senozhatsky
       [not found]   ` <20220830214626.26544-1-charlie39@cock.li>
  2 siblings, 1 reply; 20+ messages in thread
From: Lukas Czerner @ 2022-08-09  9:12 UTC (permalink / raw
  To: Jiri Slaby
  Cc: Linus Torvalds, Linux Kernel Mailing List, minchan, ngupta,
	Sergey Senozhatsky, Jan Kara, Ted Ts'o, Andreas Dilger,
	Ext4 Developers List

On Tue, Aug 09, 2022 at 08:03:11AM +0200, Jiri Slaby wrote:
> Hi,
> 
> On 31. 07. 22, 23:43, Linus Torvalds wrote:
> > So here we are, one week late, and 5.19 is tagged and pushed out.
> > 
> > The full shortlog (just from rc8, obviously not all of 5.19) is below,
> > but I can happily report that there is nothing really interesting in
> > there. A lot of random small stuff.
> 
> Note: I originally reported this downstream for tracking at:
> https://bugzilla.suse.com/show_bug.cgi?id=1202203
> 
> 5.19 behaves pretty weird in openSUSE's openQA (opposing to 5.18, or
> 5.18.15). It's all qemu-kvm "HW"¹⁾:
> https://openqa.opensuse.org/tests/2502148
> loop2: detected capacity change from 0 to 72264
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to
> inode 57375 starting block 137216)
> Buffer I/O error on device zram0, logical block 137216
> Buffer I/O error on device zram0, logical block 137217
> ...
> SQUASHFS error: xz decompression failed, data probably corrupt
> SQUASHFS error: Failed to read block 0x2e41680: -5
> SQUASHFS error: xz decompression failed, data probably corrupt
> SQUASHFS error: Failed to read block 0x2e41680: -5
> Bus error
> 
> 
> 
> https://openqa.opensuse.org/tests/2502145
> FS-Cache: Loaded
> begin 644 ldconfig.core.pid_2094.sig_7.time_1659859442
> 
> 
> 
> https://openqa.opensuse.org/tests/2502146
> FS-Cache: Loaded
> begin 644 Xorg.bin.core.pid_3733.sig_6.time_1659858784
> 
> 
> 
> https://openqa.opensuse.org/tests/2502148
> EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to
> inode 57375 starting block 137216)
> Buffer I/O error on device zram0, logical block 137216
> Buffer I/O error on device zram0, logical block 137217
> 
> 
> 
> https://openqa.opensuse.org/tests/2502154
> [   13.158090][  T634] FS-Cache: Loaded
> ...
> [  525.627024][    C0] sysrq: Show State
> 
> 
> 
> Those are various failures -- crashes of ldconfig, Xorg; I/O failures on
> zram; the last one is a lockup likely, something invoked sysrq after 500s
> stall.
> 
> Interestingly, I've also hit this twice locally:
> > init[1]: segfault at 18 ip 00007fb6154b4c81 sp 00007ffc243ed600 error 6 in
> libc.so.6[7fb61543f000+185000]
> > Code: 41 5f c3 66 0f 1f 44 00 00 42 f6 44 10 08 01 0f 84 04 01 00 00 48 83
> e1 fe 48 89 48 08 49 8b 47 70 49 89 5f 70 66 48 0f 6e c0 <48> 89 58 18 0f 16
> 44 24 08 48 81 fd ff 03 00 00 76 08 66 0f ef c9
> > ***  signal 11 ***
> > malloc(): unsorted double linked list corrupted
> > traps: init[1] general protection fault ip:7fb61543f8b9 sp:7ffc243ebf40
> error:0 in libc.so.6[7fb61543f000+185000]
> > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > CPU: 0 PID: 1 Comm: init Not tainted 5.19.0-1-default #1 openSUSE
> Tumbleweed e1df13166a33f423514290c702e43cfbb2b5b575
> 
> KASAN is not helpful either, so it's unlikely a memory corruption (unless it
> is "HW" related; should I try to turn on IOMMU in qemu?):
> > kasan: KernelAddressSanitizer initialized
> > ...
> > zram: module verification failed: signature and/or required key missing - tainting kernel
> > zram: Added device: zram0
> > zram0: detected capacity change from 0 to 2097152
> > EXT4-fs (zram0): mounting ext2 file system using the ext4 subsystem
> > EXT4-fs (zram0): mounted filesystem without journal. Quota mode: none.
> > EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
> > Buffer I/O error on device zram0, logical block 159744
> > Buffer I/O error on device zram0, logical block 159745
> 
> 
> 
> They all occur to me like a zram failure. The installer apparently creates
> an ext2 FS and after it mounts it using ext4 module, the issue starts
> occurring.
> 
> Any tests I/you could run on 5.19 to exercise zram and ext2? Otherwise I am
> unable to reproduce easily, except using the openSUSE installer :/.

Hi Jiri,

I've tried a quick xfstests run on ext2 on zram and I can't see any
issues like this so far. I will run a full test and report back in case
there is anything obvious.

-Lukas

> 
> Any other ideas? Or is this known already?
> 
> ¹⁾ main are uefi boot and virtio-blk (it likely happens with virtio-scsi
> too). The cmdline _I_ use: qemu-kvm -device intel-hda -device hda-duplex
> -drive file=/tmp/pokus.qcow2,if=none,id=hd -device virtio-blk-pci,drive=hd
> -drive if=pflash,format=raw,unit=0,readonly=on,file=/usr/share/qemu/ovmf-x86_64-opensuse-code.bin
> -drive if=pflash,format=raw,unit=1,file=/tmp/vars.bin -cdrom /tmp/cd1.iso
> -m 1G -smp 1 -net user -net nic,model=virtio -serial pty -device
> virtio-rng-pci -device qemu-xhci,p2=4,p3=4 -usbdevice tablet
> 
> 
> thanks,
> -- 
> js
> suse labs
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09  9:12   ` Lukas Czerner
@ 2022-08-09  9:15     ` Sergey Senozhatsky
  2022-08-09  9:53       ` Lukas Czerner
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Senozhatsky @ 2022-08-09  9:15 UTC (permalink / raw
  To: Lukas Czerner
  Cc: Jiri Slaby, Linus Torvalds, Linux Kernel Mailing List, minchan,
	ngupta, Sergey Senozhatsky, Jan Kara, Ted Ts'o,
	Andreas Dilger, Ext4 Developers List

On (22/08/09 11:12), Lukas Czerner wrote:
> Hi Jiri,
> 
> I've tried a quick xfstests run on ext2 on zram and I can't see any
> issues like this so far. I will run a full test and report back in case
> there is anything obvious.

AFAICT this should be visible only when we are under memory pressure,
so that direct reclaim from zs_malloc handle allocation call makes a
difference.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09  9:11         ` Sergey Senozhatsky
@ 2022-08-09  9:20           ` Sergey Senozhatsky
  2022-08-09 10:20             ` Dmitry Rokosov
  2022-08-09 12:35             ` ext2/zram issue [was: Linux 5.19] Jiri Slaby
  0 siblings, 2 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2022-08-09  9:20 UTC (permalink / raw
  To: Jiri Slaby
  Cc: Linus Torvalds, Linux Kernel Mailing List, minchan, ngupta,
	Jan Kara, Ted Ts'o, Andreas Dilger, Ext4 Developers List,
	avromanov, ddrokosov, Sergey Senozhatsky

On (22/08/09 18:11), Sergey Senozhatsky wrote:
> > > > /me needs to confirm.
> > > 
> > > With that commit reverted, I see no more I/O errors, only oom-killer
> > > messages (which is OK IMO, provided I write 1G of urandom on a machine w/
> > > 800M of RAM):
> > 
> > Hmm... So handle allocation always succeeds in the slow path? (when we
> > try to allocate it second time)
> 
> Yeah I can see how handle re-allocation with direct reclaim can make it more
> successful, but in exchange it oom-kills some user-space process, I suppose.
> Is oom-kill really a good alternative though?

We likely will need to revert e7be8d1dd983 given that it has some
user visible changes. But, honestly, failing zram write vs oom-kill
a user-space is a tough choice.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09  9:15     ` Sergey Senozhatsky
@ 2022-08-09  9:53       ` Lukas Czerner
  0 siblings, 0 replies; 20+ messages in thread
From: Lukas Czerner @ 2022-08-09  9:53 UTC (permalink / raw
  To: Sergey Senozhatsky
  Cc: Jiri Slaby, Linus Torvalds, Linux Kernel Mailing List, minchan,
	ngupta, Jan Kara, Ted Ts'o, Andreas Dilger,
	Ext4 Developers List

On Tue, Aug 09, 2022 at 06:15:37PM +0900, Sergey Senozhatsky wrote:
> On (22/08/09 11:12), Lukas Czerner wrote:
> > Hi Jiri,
> > 
> > I've tried a quick xfstests run on ext2 on zram and I can't see any
> > issues like this so far. I will run a full test and report back in case
> > there is anything obvious.
> 
> AFAICT this should be visible only when we are under memory pressure,
> so that direct reclaim from zs_malloc handle allocation call makes a
> difference.
> 

True, I haven't seen the other email from Jiri, sorry about that. I can
confirm that under memory pressure it is in fact reproducible with
xfstests and also I can confirm that reverting
e7be8d1dd983156bbdd22c0319b71119a8fbb697 makes it go away.
But Jiri has a better repro already anyway.

Thanks!
-Lukas


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09  9:20           ` Sergey Senozhatsky
@ 2022-08-09 10:20             ` Dmitry Rokosov
  2022-08-09 11:53               ` Sergey Senozhatsky
  2022-08-09 12:35             ` ext2/zram issue [was: Linux 5.19] Jiri Slaby
  1 sibling, 1 reply; 20+ messages in thread
From: Dmitry Rokosov @ 2022-08-09 10:20 UTC (permalink / raw
  To: Sergey Senozhatsky
  Cc: Jiri Slaby, Linus Torvalds, Linux Kernel Mailing List,
	minchan@kernel.org, ngupta@vflare.org, Jan Kara, Ted Ts'o,
	Andreas Dilger, Ext4 Developers List, Aleksey Romanov

Hello Sergey,

On Tue, Aug 09, 2022 at 06:20:04PM +0900, Sergey Senozhatsky wrote:
> On (22/08/09 18:11), Sergey Senozhatsky wrote:
> > > > > /me needs to confirm.
> > > > 
> > > > With that commit reverted, I see no more I/O errors, only oom-killer
> > > > messages (which is OK IMO, provided I write 1G of urandom on a machine w/
> > > > 800M of RAM):
> > > 
> > > Hmm... So handle allocation always succeeds in the slow path? (when we
> > > try to allocate it second time)
> > 
> > Yeah I can see how handle re-allocation with direct reclaim can make it more
> > successful, but in exchange it oom-kills some user-space process, I suppose.
> > Is oom-kill really a good alternative though?
> 
> We likely will need to revert e7be8d1dd983 given that it has some
> user visible changes. But, honestly, failing zram write vs oom-kill
> a user-space is a tough choice.

I think oom-kill is an inevitable escape from low memory situation if we
don't solve original problem with high memory consumption in the user
setup. Reclaim-based zram slow path just delays oom if memory eating
root cause is not resolved.

I totally agree with you that all patches which have visible user
degradations should be reverted, but maybe this is more user setup
problem, what do you think?

If you make the decision to revert slow path removal patch, I would
prefer to review the original patch with unneeded code removal again
if you don't mind:
https://lore.kernel.org/linux-block/20220422115959.3313-1-avromanov@sberdevices.ru/

-- 
Thank you,
Dmitry

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09 10:20             ` Dmitry Rokosov
@ 2022-08-09 11:53               ` Sergey Senozhatsky
  2022-08-09 13:15                 ` Aleksey Romanov
  2022-08-10  7:06                 ` [PATCH] Revert "zram: remove double compression logic" Jiri Slaby
  0 siblings, 2 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2022-08-09 11:53 UTC (permalink / raw
  To: Dmitry Rokosov, Jiri Slaby, Minchan Kim
  Cc: Sergey Senozhatsky, Jiri Slaby, Linus Torvalds,
	Linux Kernel Mailing List, ngupta@vflare.org, Jan Kara,
	Ted Ts'o, Andreas Dilger, Ext4 Developers List,
	Aleksey Romanov

Hi,

On (22/08/09 10:20), Dmitry Rokosov wrote:
> I think oom-kill is an inevitable escape from low memory situation if we
> don't solve original problem with high memory consumption in the user
> setup. Reclaim-based zram slow path just delays oom if memory eating
> root cause is not resolved.
> 
> I totally agree with you that all patches which have visible user
> degradations should be reverted, but maybe this is more user setup
> problem, what do you think?

I'd go with the revert.
Jiri, are you going to send the revert patch or shall I handle it?

> If you make the decision to revert slow path removal patch, I would
> prefer to review the original patch with unneeded code removal again
> if you don't mind:
> https://lore.kernel.org/linux-block/20220422115959.3313-1-avromanov@sberdevices.ru/

Sure, we can return to it after the merge window.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09  9:20           ` Sergey Senozhatsky
  2022-08-09 10:20             ` Dmitry Rokosov
@ 2022-08-09 12:35             ` Jiri Slaby
  2022-08-09 12:45               ` Jiri Slaby
  1 sibling, 1 reply; 20+ messages in thread
From: Jiri Slaby @ 2022-08-09 12:35 UTC (permalink / raw
  To: Sergey Senozhatsky
  Cc: Linus Torvalds, Linux Kernel Mailing List, minchan, ngupta,
	Jan Kara, Ted Ts'o, Andreas Dilger, Ext4 Developers List,
	avromanov, ddrokosov

On 09. 08. 22, 11:20, Sergey Senozhatsky wrote:
> On (22/08/09 18:11), Sergey Senozhatsky wrote:
>>>>> /me needs to confirm.
>>>>
>>>> With that commit reverted, I see no more I/O errors, only oom-killer
>>>> messages (which is OK IMO, provided I write 1G of urandom on a machine w/
>>>> 800M of RAM):
>>>
>>> Hmm... So handle allocation always succeeds in the slow path? (when we
>>> try to allocate it second time)
>>
>> Yeah I can see how handle re-allocation with direct reclaim can make it more
>> successful, but in exchange it oom-kills some user-space process, I suppose.
>> Is oom-kill really a good alternative though?
> 
> We likely will need to revert e7be8d1dd983 given that it has some
> user visible changes. But, honestly, failing zram write vs oom-kill
> a user-space is a tough choice.

Note that it OOMs only in my use case -- it's obviously too large zram 
on too low memory machine.

But the installer is different. It just creates memory pressure, yet, 
reclaim works well and is able to find memory and go on. I would say 
atomic vs non-atomic retry in the original (pre-5.19) approach makes the 
difference.

And yes, we should likely increase the memory in openQA to avoid too 
many reclaims...

PS the kernel finished building, now images are built, hence the new 
openQA run hasn't started yet. I will send the revert when it's complete 
and all green.

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09 12:35             ` ext2/zram issue [was: Linux 5.19] Jiri Slaby
@ 2022-08-09 12:45               ` Jiri Slaby
  2022-08-09 12:57                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Slaby @ 2022-08-09 12:45 UTC (permalink / raw
  To: Sergey Senozhatsky
  Cc: Linus Torvalds, Linux Kernel Mailing List, minchan, ngupta,
	Jan Kara, Ted Ts'o, Andreas Dilger, Ext4 Developers List,
	avromanov, ddrokosov

On 09. 08. 22, 14:35, Jiri Slaby wrote:
> But the installer is different. It just creates memory pressure, yet, 
> reclaim works well and is able to find memory and go on. I would say 
> atomic vs non-atomic retry in the original (pre-5.19) approach makes the 
> difference.

Sorry, I meant no-direct-reclaim (5.19) vs direct-reclaim (pre-5.19).

-- 
js
suse labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09 12:45               ` Jiri Slaby
@ 2022-08-09 12:57                 ` Sergey Senozhatsky
  2022-08-09 13:07                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Senozhatsky @ 2022-08-09 12:57 UTC (permalink / raw
  To: Jiri Slaby
  Cc: Sergey Senozhatsky, Linus Torvalds, Linux Kernel Mailing List,
	minchan, ngupta, Jan Kara, Ted Ts'o, Andreas Dilger,
	Ext4 Developers List, avromanov, ddrokosov

On (22/08/09 14:45), Jiri Slaby wrote:
> On 09. 08. 22, 14:35, Jiri Slaby wrote:
> > But the installer is different. It just creates memory pressure, yet,
> > reclaim works well and is able to find memory and go on. I would say
> > atomic vs non-atomic retry in the original (pre-5.19) approach makes the
> > difference.
> 
> Sorry, I meant no-direct-reclaim (5.19) vs direct-reclaim (pre-5.19).

Sure, I understood.

This somehow makes me scratch my head and ask if we really want to
continue using per-CPU steams. We previously (many years ago) had
a list of idle compression streams, which would do compression in
preemptible context and we would have only one zs_malloc handle
allocation path, which would do direct reclaim (when needed)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09 12:57                 ` Sergey Senozhatsky
@ 2022-08-09 13:07                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2022-08-09 13:07 UTC (permalink / raw
  To: Jiri Slaby
  Cc: Linus Torvalds, Linux Kernel Mailing List, minchan, ngupta,
	Jan Kara, Ted Ts'o, Andreas Dilger, Ext4 Developers List,
	avromanov, ddrokosov, Sergey Senozhatsky

On (22/08/09 21:57), Sergey Senozhatsky wrote:
> This somehow makes me scratch my head and ask if we really want to
> continue using per-CPU steams. We previously (many years ago) had
> a list of idle compression streams, which would do compression in
> preemptible context and we would have only one zs_malloc handle
> allocation path, which would do direct reclaim (when needed)

Scratch that, I take it back. Sorry for the noise.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09 11:53               ` Sergey Senozhatsky
@ 2022-08-09 13:15                 ` Aleksey Romanov
  2022-08-09 13:29                   ` Sergey Senozhatsky
  2022-08-10  7:06                 ` [PATCH] Revert "zram: remove double compression logic" Jiri Slaby
  1 sibling, 1 reply; 20+ messages in thread
From: Aleksey Romanov @ 2022-08-09 13:15 UTC (permalink / raw
  To: Sergey Senozhatsky
  Cc: Dmitry Rokosov, Jiri Slaby, Minchan Kim, Linus Torvalds,
	Linux Kernel Mailing List, ngupta@vflare.org, Jan Kara,
	Ted Ts'o, Andreas Dilger, Ext4 Developers List

Hi Sergey,

On Tue, Aug 09, 2022 at 08:53:36PM +0900, Sergey Senozhatsky wrote:
> > If you make the decision to revert slow path removal patch, I would
> > prefer to review the original patch with unneeded code removal again
> > if you don't mind:
> > https://lore.kernel.org/linux-block/20220422115959.3313-1-avromanov@sberdevices.ru/
> 
> Sure, we can return to it after the merge window.

In that case, do I need to send my original patch again? 

-- 
Thank you,
Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
  2022-08-09 13:15                 ` Aleksey Romanov
@ 2022-08-09 13:29                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2022-08-09 13:29 UTC (permalink / raw
  To: Aleksey Romanov
  Cc: Sergey Senozhatsky, Dmitry Rokosov, Jiri Slaby, Minchan Kim,
	Linus Torvalds, Linux Kernel Mailing List, ngupta@vflare.org,
	Jan Kara, Ted Ts'o, Andreas Dilger, Ext4 Developers List

On (22/08/09 13:15), Aleksey Romanov wrote:
> On Tue, Aug 09, 2022 at 08:53:36PM +0900, Sergey Senozhatsky wrote:
> > > If you make the decision to revert slow path removal patch, I would
> > > prefer to review the original patch with unneeded code removal again
> > > if you don't mind:
> > > https://lore.kernel.org/linux-block/20220422115959.3313-1-avromanov@sberdevices.ru/
> > 
> > Sure, we can return to it after the merge window.
> 
> In that case, do I need to send my original patch again? 

Would be nice, since the patch needs rebasing (due to zsmalloc PTR_ERR changes)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] Revert "zram: remove double compression logic"
  2022-08-09 11:53               ` Sergey Senozhatsky
  2022-08-09 13:15                 ` Aleksey Romanov
@ 2022-08-10  7:06                 ` Jiri Slaby
  2022-08-10  7:14                   ` Sergey Senozhatsky
  1 sibling, 1 reply; 20+ messages in thread
From: Jiri Slaby @ 2022-08-10  7:06 UTC (permalink / raw
  To: akpm
  Cc: linux-kernel, jack, adilger.kernel, tytso, Jiri Slaby, stable,
	Minchan Kim, Nitin Gupta, Sergey Senozhatsky, Alexey Romanov,
	Dmitry Rokosov, Lukas Czerner, Ext4 Developers List

This reverts commit e7be8d1dd983156bbdd22c0319b71119a8fbb697 as it
causes zram failures. It does not revert cleanly, PTR_ERR handling was
introduced in the meantime. This is handled by appropriate IS_ERR.

When under memory pressure, zs_malloc() can fail. Before the above
commit, the allocation was retried with direct reclaim enabled
(GFP_NOIO). After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is
tried.

So when the failure occurs under memory pressure, the overlaying
filesystem such as ext2 (mounted by ext4 module in this case) can emit
failures, making the (file)system unusable:
  EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
  Buffer I/O error on device zram0, logical block 159744

With direct reclaim, memory is really reclaimed and allocation succeeds,
eventually. In the worst case, the oom killer is invoked, which is
proper outcome if user sets up zram too large (in comparison to
available RAM).

This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
above). Use revert of e7be8d1dd983 directly.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
Fixes: e7be8d1dd983 ("zram: remove double compression logic")
Cc: stable@vger.kernel.org # 5.19
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 drivers/block/zram/zram_drv.c | 42 ++++++++++++++++++++++++++---------
 drivers/block/zram/zram_drv.h |  1 +
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 92cb929a45b7..226ea76cc819 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1146,14 +1146,15 @@ static ssize_t bd_stat_show(struct device *dev,
 static ssize_t debug_stat_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
-	int version = 2;
+	int version = 1;
 	struct zram *zram = dev_to_zram(dev);
 	ssize_t ret;
 
 	down_read(&zram->init_lock);
 	ret = scnprintf(buf, PAGE_SIZE,
-			"version: %d\n%8llu\n",
+			"version: %d\n%8llu %8llu\n",
 			version,
+			(u64)atomic64_read(&zram->stats.writestall),
 			(u64)atomic64_read(&zram->stats.miss_free));
 	up_read(&zram->init_lock);
 
@@ -1351,7 +1352,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 {
 	int ret = 0;
 	unsigned long alloced_pages;
-	unsigned long handle = 0;
+	unsigned long handle = -ENOMEM;
 	unsigned int comp_len = 0;
 	void *src, *dst, *mem;
 	struct zcomp_strm *zstrm;
@@ -1369,6 +1370,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 	}
 	kunmap_atomic(mem);
 
+compress_again:
 	zstrm = zcomp_stream_get(zram->comp);
 	src = kmap_atomic(page);
 	ret = zcomp_compress(zstrm, src, &comp_len);
@@ -1377,20 +1379,39 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 	if (unlikely(ret)) {
 		zcomp_stream_put(zram->comp);
 		pr_err("Compression failed! err=%d\n", ret);
+		zs_free(zram->mem_pool, handle);
 		return ret;
 	}
 
 	if (comp_len >= huge_class_size)
 		comp_len = PAGE_SIZE;
-
-	handle = zs_malloc(zram->mem_pool, comp_len,
-			__GFP_KSWAPD_RECLAIM |
-			__GFP_NOWARN |
-			__GFP_HIGHMEM |
-			__GFP_MOVABLE);
-
+	/*
+	 * handle allocation has 2 paths:
+	 * a) fast path is executed with preemption disabled (for
+	 *  per-cpu streams) and has __GFP_DIRECT_RECLAIM bit clear,
+	 *  since we can't sleep;
+	 * b) slow path enables preemption and attempts to allocate
+	 *  the page with __GFP_DIRECT_RECLAIM bit set. we have to
+	 *  put per-cpu compression stream and, thus, to re-do
+	 *  the compression once handle is allocated.
+	 *
+	 * if we have a 'non-null' handle here then we are coming
+	 * from the slow path and handle has already been allocated.
+	 */
+	if (IS_ERR((void *)handle))
+		handle = zs_malloc(zram->mem_pool, comp_len,
+				__GFP_KSWAPD_RECLAIM |
+				__GFP_NOWARN |
+				__GFP_HIGHMEM |
+				__GFP_MOVABLE);
 	if (IS_ERR((void *)handle)) {
 		zcomp_stream_put(zram->comp);
+		atomic64_inc(&zram->stats.writestall);
+		handle = zs_malloc(zram->mem_pool, comp_len,
+				GFP_NOIO | __GFP_HIGHMEM |
+				__GFP_MOVABLE);
+		if (!IS_ERR((void *)handle))
+			goto compress_again;
 		return PTR_ERR((void *)handle);
 	}
 
@@ -1948,6 +1969,7 @@ static int zram_add(void)
 	if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
 		blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
+	blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
 	ret = device_add_disk(NULL, zram->disk, zram_disk_groups);
 	if (ret)
 		goto out_cleanup_disk;
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 158c91e54850..80c3b43b4828 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -81,6 +81,7 @@ struct zram_stats {
 	atomic64_t huge_pages_since;	/* no. of huge pages since zram set up */
 	atomic64_t pages_stored;	/* no. of pages currently stored */
 	atomic_long_t max_used_pages;	/* no. of maximum pages stored */
+	atomic64_t writestall;		/* no. of write slow paths */
 	atomic64_t miss_free;		/* no. of missed free */
 #ifdef	CONFIG_ZRAM_WRITEBACK
 	atomic64_t bd_count;		/* no. of pages in backing device */
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Revert "zram: remove double compression logic"
  2022-08-10  7:06                 ` [PATCH] Revert "zram: remove double compression logic" Jiri Slaby
@ 2022-08-10  7:14                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2022-08-10  7:14 UTC (permalink / raw
  To: Jiri Slaby
  Cc: akpm, linux-kernel, jack, adilger.kernel, tytso, stable,
	Minchan Kim, Nitin Gupta, Sergey Senozhatsky, Alexey Romanov,
	Dmitry Rokosov, Lukas Czerner, Ext4 Developers List

On (22/08/10 09:06), Jiri Slaby wrote:
> This reverts commit e7be8d1dd983156bbdd22c0319b71119a8fbb697 as it
> causes zram failures. It does not revert cleanly, PTR_ERR handling was
> introduced in the meantime. This is handled by appropriate IS_ERR.
> 
> When under memory pressure, zs_malloc() can fail. Before the above
> commit, the allocation was retried with direct reclaim enabled
> (GFP_NOIO). After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is
> tried.
> 
> So when the failure occurs under memory pressure, the overlaying
> filesystem such as ext2 (mounted by ext4 module in this case) can emit
> failures, making the (file)system unusable:
>   EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744)
>   Buffer I/O error on device zram0, logical block 159744
> 
> With direct reclaim, memory is really reclaimed and allocation succeeds,
> eventually. In the worst case, the oom killer is invoked, which is
> proper outcome if user sets up zram too large (in comparison to
> available RAM).
> 
> This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note
> above). Use revert of e7be8d1dd983 directly.
> 
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203
> Fixes: e7be8d1dd983 ("zram: remove double compression logic")
> Cc: stable@vger.kernel.org # 5.19
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Nitin Gupta <ngupta@vflare.org>
> Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
> Cc: Alexey Romanov <avromanov@sberdevices.ru>
> Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru>
> Cc: Lukas Czerner <lczerner@redhat.com>
> Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: ext2/zram issue [was: Linux 5.19]
       [not found]   ` <20220830214626.26544-1-charlie39@cock.li>
@ 2022-08-31  7:55     ` Jiri Slaby
  0 siblings, 0 replies; 20+ messages in thread
From: Jiri Slaby @ 2022-08-31  7:55 UTC (permalink / raw
  To: charlie39
  Cc: adilger.kernel, jack, linux-ext4, linux-kernel, minchan, ngupta,
	senozhatsky, torvalds, tytso

On 30. 08. 22, 23:46, charlie39@cock.li wrote:
> Hi, I think i bumped on  the same issue on version 5.19.2 with ext4 on zram mounted on /tmp

Only 5.19.6 contains the fix.

> ```
> # sudo dmesg -T | grep ext4
> 
> [Tue Aug 30 21:41:45 2022] EXT4-fs error (device zram1): ext4_check_bdev_write_error:218: comm kworker/u8:3: Error while
> [Tue Aug 30 21:41:45 2022] EXT4-fs warning (device zram1): ext4_end_bio:347: I/O error 10 writing to inode 76 starting b
> [Tue Aug 30 21:41:45 2022] EXT4-fs warning (device zram1): ext4_end_bio:347: I/O error 10 writing to inode 76 starting b
> [Tue Aug 30 21:41:45 2022] EXT4-fs warning (device zram1): ext4_end_bio:347: I/O error 10 writing to inode 66 starting b
> [Tue Aug 30 22:07:02 2022] EXT4-fs error (device zram1): ext4_journal_check_start:83: comm ThreadPoolForeg: Detected abo
> [Tue Aug 30 22:07:02 2022] EXT4-fs (zram1): Remounting filesystem read-only
> 
> ```
> Not sure what caused it, i was just updating my arch system.
> 

-- 
js


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-08-31  7:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAHk-=wgrz5BBk=rCz7W28Fj_o02s0Xi0OEQ3H1uQgOdFvHgx0w@mail.gmail.com>
2022-08-09  6:03 ` ext2/zram issue [was: Linux 5.19] Jiri Slaby
2022-08-09  7:59   ` Jiri Slaby
2022-08-09  8:12     ` Jiri Slaby
2022-08-09  8:43       ` Sergey Senozhatsky
2022-08-09  9:11         ` Sergey Senozhatsky
2022-08-09  9:20           ` Sergey Senozhatsky
2022-08-09 10:20             ` Dmitry Rokosov
2022-08-09 11:53               ` Sergey Senozhatsky
2022-08-09 13:15                 ` Aleksey Romanov
2022-08-09 13:29                   ` Sergey Senozhatsky
2022-08-10  7:06                 ` [PATCH] Revert "zram: remove double compression logic" Jiri Slaby
2022-08-10  7:14                   ` Sergey Senozhatsky
2022-08-09 12:35             ` ext2/zram issue [was: Linux 5.19] Jiri Slaby
2022-08-09 12:45               ` Jiri Slaby
2022-08-09 12:57                 ` Sergey Senozhatsky
2022-08-09 13:07                   ` Sergey Senozhatsky
2022-08-09  9:12   ` Lukas Czerner
2022-08-09  9:15     ` Sergey Senozhatsky
2022-08-09  9:53       ` Lukas Czerner
     [not found]   ` <20220830214626.26544-1-charlie39@cock.li>
2022-08-31  7:55     ` Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).