All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* new 9p kasan splat in 6.9
@ 2024-03-31  5:33 Kent Overstreet
  2024-04-02  0:02 ` Eric Van Hensbergen
  0 siblings, 1 reply; 11+ messages in thread
From: Kent Overstreet @ 2024-03-31  5:33 UTC (permalink / raw
  To: v9fs

00000 Running test kasan-ec.ktest on farm2 at /home/testdashboard/linux-5
00164 building kernel... done
00169 systemd[1]: Failed to find module 'autofs4'
00170 ==================================================================
00170 BUG: KASAN: slab-use-after-free in v9fs_stat2inode_dotl+0x7f8/0x988
00170 Read of size 8 at addr ffff0000c12f9000 by task mount/217
00170 
00170 CPU: 3 PID: 217 Comm: mount Not tainted 6.9.0-rc1-ktest-ga097468ffe82 #10998
00170 Hardware name: linux,dummy-virt (DT)
00170 Call trace:
00170  dump_backtrace+0xa4/0xe0
00170  show_stack+0x1c/0x30
00170  dump_stack_lvl+0x70/0x88
00170  print_report+0x110/0x5b8
00170  kasan_report+0x80/0xc0
00170  __asan_report_load8_noabort+0x1c/0x28
00170  v9fs_stat2inode_dotl+0x7f8/0x988
00170  v9fs_fid_iget_dotl+0x164/0x1f0
00170  v9fs_mount+0x380/0x718
00170  legacy_get_tree+0xd4/0x198
00170  vfs_get_tree+0x78/0x240
00170  path_mount+0xc6c/0x15f0
00170  do_mount+0xc4/0x100
00170  __arm64_sys_mount+0x228/0x330
00170  invoke_syscall.constprop.0+0x74/0x1e8
00170  do_el0_svc+0xc8/0x200
00170  el0_svc+0x20/0x60
00170  el0t_64_sync_handler+0xb8/0xc0
00170  el0t_64_sync+0x14c/0x150
00170 
00170 Allocated by task 217:
00170 
00170 Freed by task 217:
00170 
00170 The buggy address belongs to the object at ffff0000c12f9000
00170  which belongs to the cache kmalloc-192 of size 192
00170 The buggy address is located 0 bytes inside of
00170  freed 192-byte region [ffff0000c12f9000, ffff0000c12f90c0)
00170 
00170 The buggy address belongs to the physical page:
00170 
00170 Memory state around the buggy address:
00170  ffff0000c12f8f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00170  ffff0000c12f8f80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
00170 >ffff0000c12f9000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
00170                    ^
00170  ffff0000c12f9080: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
00170  ffff0000c12f9100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
00170 ==================================================================
00170 Kernel panic - not syncing: kasan.fault=panic set ...
00170 CPU: 3 PID: 217 Comm: mount Not tainted 6.9.0-rc1-ktest-ga097468ffe82 #10998
00170 Hardware name: linux,dummy-virt (DT)
00170 Call trace:
00170  dump_backtrace+0xa4/0xe0
00170  show_stack+0x1c/0x30
00170  dump_stack_lvl+0x34/0x88
00170  dump_stack+0x18/0x20
00170  panic+0x4dc/0x520
00170  end_report+0xec/0xf0
00170  kasan_report+0x90/0xc0
00170  __asan_report_load8_noabort+0x1c/0x28
00170  v9fs_stat2inode_dotl+0x7f8/0x988
00170  v9fs_fid_iget_dotl+0x164/0x1f0
00170  v9fs_mount+0x380/0x718
00170  legacy_get_tree+0xd4/0x198
00170  vfs_get_tree+0x78/0x240
00170  path_mount+0xc6c/0x15f0
00170  do_mount+0xc4/0x100
00170  __arm64_sys_mount+0x228/0x330
00170  invoke_syscall.constprop.0+0x74/0x1e8
00170  do_el0_svc+0xc8/0x200
00170  el0_svc+0x20/0x60
00170  el0t_64_sync_handler+0xb8/0xc0
00170  el0t_64_sync+0x14c/0x150
00170 SMP: stopping secondary CPUs
00170 Kernel Offset: disabled
00170 CPU features: 0x0,00000003,80000008,4240500b
00170 Memory Limit: none
00170 ---[ end Kernel panic - not syncing: kasan.fault=panic set ... ]---
00175 ========= FAILED TIMEOUT (no test) in 1200s

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: new 9p kasan splat in 6.9
  2024-03-31  5:33 new 9p kasan splat in 6.9 Kent Overstreet
@ 2024-04-02  0:02 ` Eric Van Hensbergen
  2024-04-02  0:07   ` Kent Overstreet
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Van Hensbergen @ 2024-04-02  0:02 UTC (permalink / raw
  To: Kent Overstreet, v9fs

This should be fixed in -rc2.

March 31, 2024 at 12:33 AM, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
> 
> 00000 Running test kasan-ec.ktest on farm2 at /home/testdashboard/linux-5
> 
> 00164 building kernel... done
> 
> 00169 systemd[1]: Failed to find module 'autofs4'
> 
> 00170 ==================================================================
> 
> 00170 BUG: KASAN: slab-use-after-free in v9fs_stat2inode_dotl+0x7f8/0x988
> 
> 00170 Read of size 8 at addr ffff0000c12f9000 by task mount/217
> 
> 00170 
> 
> 00170 CPU: 3 PID: 217 Comm: mount Not tainted 6.9.0-rc1-ktest-ga097468ffe82 #10998
> 
> 00170 Hardware name: linux,dummy-virt (DT)
> 
> 00170 Call trace:
> 
> 00170 dump_backtrace+0xa4/0xe0
> 
> 00170 show_stack+0x1c/0x30
> 
> 00170 dump_stack_lvl+0x70/0x88
> 
> 00170 print_report+0x110/0x5b8
> 
> 00170 kasan_report+0x80/0xc0
> 
> 00170 __asan_report_load8_noabort+0x1c/0x28
> 
> 00170 v9fs_stat2inode_dotl+0x7f8/0x988
> 
> 00170 v9fs_fid_iget_dotl+0x164/0x1f0
> 
> 00170 v9fs_mount+0x380/0x718
> 
> 00170 legacy_get_tree+0xd4/0x198
> 
> 00170 vfs_get_tree+0x78/0x240
> 
> 00170 path_mount+0xc6c/0x15f0
> 
> 00170 do_mount+0xc4/0x100
> 
> 00170 __arm64_sys_mount+0x228/0x330
> 
> 00170 invoke_syscall.constprop.0+0x74/0x1e8
> 
> 00170 do_el0_svc+0xc8/0x200
> 
> 00170 el0_svc+0x20/0x60
> 
> 00170 el0t_64_sync_handler+0xb8/0xc0
> 
> 00170 el0t_64_sync+0x14c/0x150
> 
> 00170 
> 
> 00170 Allocated by task 217:
> 
> 00170 
> 
> 00170 Freed by task 217:
> 
> 00170 
> 
> 00170 The buggy address belongs to the object at ffff0000c12f9000
> 
> 00170 which belongs to the cache kmalloc-192 of size 192
> 
> 00170 The buggy address is located 0 bytes inside of
> 
> 00170 freed 192-byte region [ffff0000c12f9000, ffff0000c12f90c0)
> 
> 00170 
> 
> 00170 The buggy address belongs to the physical page:
> 
> 00170 
> 
> 00170 Memory state around the buggy address:
> 
> 00170 ffff0000c12f8f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> 00170 ffff0000c12f8f80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
> 
> 00170 >ffff0000c12f9000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 
> 00170 ^
> 
> 00170 ffff0000c12f9080: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> 
> 00170 ffff0000c12f9100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> 
> 00170 ==================================================================
> 
> 00170 Kernel panic - not syncing: kasan.fault=panic set ...
> 
> 00170 CPU: 3 PID: 217 Comm: mount Not tainted 6.9.0-rc1-ktest-ga097468ffe82 #10998
> 
> 00170 Hardware name: linux,dummy-virt (DT)
> 
> 00170 Call trace:
> 
> 00170 dump_backtrace+0xa4/0xe0
> 
> 00170 show_stack+0x1c/0x30
> 
> 00170 dump_stack_lvl+0x34/0x88
> 
> 00170 dump_stack+0x18/0x20
> 
> 00170 panic+0x4dc/0x520
> 
> 00170 end_report+0xec/0xf0
> 
> 00170 kasan_report+0x90/0xc0
> 
> 00170 __asan_report_load8_noabort+0x1c/0x28
> 
> 00170 v9fs_stat2inode_dotl+0x7f8/0x988
> 
> 00170 v9fs_fid_iget_dotl+0x164/0x1f0
> 
> 00170 v9fs_mount+0x380/0x718
> 
> 00170 legacy_get_tree+0xd4/0x198
> 
> 00170 vfs_get_tree+0x78/0x240
> 
> 00170 path_mount+0xc6c/0x15f0
> 
> 00170 do_mount+0xc4/0x100
> 
> 00170 __arm64_sys_mount+0x228/0x330
> 
> 00170 invoke_syscall.constprop.0+0x74/0x1e8
> 
> 00170 do_el0_svc+0xc8/0x200
> 
> 00170 el0_svc+0x20/0x60
> 
> 00170 el0t_64_sync_handler+0xb8/0xc0
> 
> 00170 el0t_64_sync+0x14c/0x150
> 
> 00170 SMP: stopping secondary CPUs
> 
> 00170 Kernel Offset: disabled
> 
> 00170 CPU features: 0x0,00000003,80000008,4240500b
> 
> 00170 Memory Limit: none
> 
> 00170 ---[ end Kernel panic - not syncing: kasan.fault=panic set ... ]---
> 
> 00175 ========= FAILED TIMEOUT (no test) in 1200s
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: new 9p kasan splat in 6.9
  2024-04-02  0:02 ` Eric Van Hensbergen
@ 2024-04-02  0:07   ` Kent Overstreet
  2024-04-02  0:33     ` Eric Van Hensbergen
  2024-04-15 13:48     ` Eric Van Hensbergen
  0 siblings, 2 replies; 11+ messages in thread
From: Kent Overstreet @ 2024-04-02  0:07 UTC (permalink / raw
  To: Eric Van Hensbergen; +Cc: v9fs

On Tue, Apr 02, 2024 at 12:02:43AM +0000, Eric Van Hensbergen wrote:
> This should be fixed in -rc2.

I'm still seeing sporadic weird fstests failures on rc2 though - note
that I'm not testing 9p specifically, but using it for copying results
to the host.

Example:
https://evilpiepirate.org/~testdashboard/c/cecfed9b446da5fba9d73e6448c9f0d1ff5d95ff/xfstests-nocow.generic.035/log.br

I see runs where this causes the number of test failures to jump from
the typical ~30 all the way up to 50; IOW, these happen all in a row
when they happen.

https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs-testing&commit=cecfed9b446da5fba9d73e6448c9f0d1ff5d95ff

What's going on with 9p testing? It seems we've got multiple bugs that
made it to rc1 that would have been caught with only moderate testing.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: new 9p kasan splat in 6.9
  2024-04-02  0:07   ` Kent Overstreet
@ 2024-04-02  0:33     ` Eric Van Hensbergen
  2024-04-02  1:12       ` Kent Overstreet
  2024-04-15 13:48     ` Eric Van Hensbergen
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Van Hensbergen @ 2024-04-02  0:33 UTC (permalink / raw
  To: Kent Overstreet; +Cc: Eric Van Hensbergen, v9fs

I've got standard regression runs with dbench, fsx, postmark and a few
special case ones to check problem areas we've had in the past.  There
was a pretty extensive rework of some of the caching mechanisms -- so
I was expecting this to be a busy cycle.  There were a couple of fixes
that were in the pipe but missed -rc1 because of day-job deadlines
landing same time as kernel merge window, but these are mostly
upstream now except for a few legacy fixes which are still in my
fixes/next tree.

Looking at your example above -- is it the case that most of these,
there is essentially a missing log file that isn't being created by
9p?  Any other details you can give me from your environment (how is
9p mounted, is this qemu?, etc.)?

       -eric

On Mon, Apr 1, 2024 at 7:07 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Tue, Apr 02, 2024 at 12:02:43AM +0000, Eric Van Hensbergen wrote:
> > This should be fixed in -rc2.
>
> I'm still seeing sporadic weird fstests failures on rc2 though - note
> that I'm not testing 9p specifically, but using it for copying results
> to the host.
>
> Example:
> https://evilpiepirate.org/~testdashboard/c/cecfed9b446da5fba9d73e6448c9f0d1ff5d95ff/xfstests-nocow.generic.035/log.br
>
> I see runs where this causes the number of test failures to jump from
> the typical ~30 all the way up to 50; IOW, these happen all in a row
> when they happen.
>
> https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs-testing&commit=cecfed9b446da5fba9d73e6448c9f0d1ff5d95ff
>
> What's going on with 9p testing? It seems we've got multiple bugs that
> made it to rc1 that would have been caught with only moderate testing.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: new 9p kasan splat in 6.9
  2024-04-02  0:33     ` Eric Van Hensbergen
@ 2024-04-02  1:12       ` Kent Overstreet
  2024-04-02  1:27         ` Eric Van Hensbergen
  0 siblings, 1 reply; 11+ messages in thread
From: Kent Overstreet @ 2024-04-02  1:12 UTC (permalink / raw
  To: Eric Van Hensbergen; +Cc: Eric Van Hensbergen, v9fs

On Mon, Apr 01, 2024 at 07:33:50PM -0500, Eric Van Hensbergen wrote:
> I've got standard regression runs with dbench, fsx, postmark and a few
> special case ones to check problem areas we've had in the past.  There
> was a pretty extensive rework of some of the caching mechanisms -- so
> I was expecting this to be a busy cycle.  There were a couple of fixes
> that were in the pipe but missed -rc1 because of day-job deadlines
> landing same time as kernel merge window, but these are mostly
> upstream now except for a few legacy fixes which are still in my
> fixes/next tree.

Sounds like things were rushed a bit then, and there's some room for
improving the testing.

If you need testing automation I might be able to help.

> Looking at your example above -- is it the case that most of these,
> there is essentially a missing log file that isn't being created by
> 9p?

Appears to be, but it doesn't happen often enough for me to repro
locally.

> Any other details you can give me from your environment (how is
> 9p mounted, is this qemu?, etc.)?

It's qemu, standard mount options:
host /host 9p rw,relatime,access=client,trans=virtio 0 0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: new 9p kasan splat in 6.9
  2024-04-02  1:12       ` Kent Overstreet
@ 2024-04-02  1:27         ` Eric Van Hensbergen
  2024-04-02  1:34           ` Kent Overstreet
  2024-04-10 11:43           ` Eric Van Hensbergen
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Van Hensbergen @ 2024-04-02  1:27 UTC (permalink / raw
  To: Kent Overstreet; +Cc: v9fs

April 1, 2024 at 8:12 PM, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
> 
> On Mon, Apr 01, 2024 at 07:33:50PM -0500, Eric Van Hensbergen wrote:
> 
> 
> Sounds like things were rushed a bit then, and there's some room for 
> improving the testing.
> 

Well, most of the patches had been in there almost since the last release, its just I didn't have time to test and incorporate fixes during the -rc cycle and most of the ones I saw looked like corner cases from kasan so I didn't give them as much attention as perhaps I should have.

> If you need testing automation I might be able to help.

I do like your dashboards and what not, not sure if I'm up for trying Nix again though ;)

> 
> > 
> > Looking at your example above -- is it the case that most of these,
> >  there is essentially a missing log file that isn't being created by
> >  9p?
> > 
> 
> Appears to be, but it doesn't happen often enough for me to repro
> locally.
> 

Okay, that's useful to see if I can track things down.

> > 
> > Any other details you can give me from your environment (how is
> > 
> >  9p mounted, is this qemu?, etc.)?
> > 
> 
> It's qemu, standard mount options:
> host /host 9p rw,relatime,access=client,trans=virtio 0 0
>

access=client may be someplace for me to start, I don't usually use ACL based checks in my regression sweeps.  I'll update to add that to the matrix as well as put fstest back in the default regressions.

trans=virtio,version=9p2000.L,cache=none,access=user

should still be the most stable, but this recent set of patches does change inode (and inode_no) handling and I believe by default they may stick around even though with nocache they should always be refreshed.

     -eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: new 9p kasan splat in 6.9
  2024-04-02  1:27         ` Eric Van Hensbergen
@ 2024-04-02  1:34           ` Kent Overstreet
  2024-04-10 11:43           ` Eric Van Hensbergen
  1 sibling, 0 replies; 11+ messages in thread
From: Kent Overstreet @ 2024-04-02  1:34 UTC (permalink / raw
  To: Eric Van Hensbergen; +Cc: v9fs

On Tue, Apr 02, 2024 at 01:27:06AM +0000, Eric Van Hensbergen wrote:
> April 1, 2024 at 8:12 PM, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
> > 
> > On Mon, Apr 01, 2024 at 07:33:50PM -0500, Eric Van Hensbergen wrote:
> > 
> > 
> > Sounds like things were rushed a bit then, and there's some room for 
> > improving the testing.
> > 
> 
> Well, most of the patches had been in there almost since the last release, its just I didn't have time to test and incorporate fixes during the -rc cycle and most of the ones I saw looked like corner cases from kasan so I didn't give them as much attention as perhaps I should have.
> 
> > If you need testing automation I might be able to help.
> 
> I do like your dashboards and what not, not sure if I'm up for trying Nix again though ;)

There's no Nix required for my stuff (yet :)

I can point my cluster at your branch(es) with whatever tests you want
to run if you'd be willing to chip in on server costs, the hetzner
machines I use are ~250/month and one more would be sufficient.

(imagine what we could do if someone ponied up for a dozen of those, I'm
doing all my testing with just 2 - the 80 core arm machines are
_wonderful_).

> access=client may be someplace for me to start, I don't usually use ACL based checks in my regression sweeps.  I'll update to add that to the matrix as well as put fstest back in the default regressions.
> 
> trans=virtio,version=9p2000.L,cache=none,access=user
> 
> should still be the most stable, but this recent set of patches does change inode (and inode_no) handling and I believe by default they may stick around even though with nocache they should always be refreshed.

If you don't find a way to repro it or otherwrise track it down I might
be able to try that in a day or so.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: new 9p kasan splat in 6.9
  2024-04-02  1:27         ` Eric Van Hensbergen
  2024-04-02  1:34           ` Kent Overstreet
@ 2024-04-10 11:43           ` Eric Van Hensbergen
  2024-04-10 17:02             ` Kent Overstreet
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Van Hensbergen @ 2024-04-10 11:43 UTC (permalink / raw
  To: Kent Overstreet; +Cc: v9fs

I'm having a hard time reproducing the 

April 1, 2024 at 8:27 PM, "Eric Van Hensbergen" <eric.vanhensbergen@linux.dev> wrote:
> 
> April 1, 2024 at 8:12 PM, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
> > 
> > Any other details you can give me from your environment (how is
> >  9p mounted, is this qemu?, etc.)?
> >  It's qemu, standard mount options:
> > 
> >  host /host 9p rw,relatime,access=client,trans=virtio 0 0
> > 
> 

I'm continuing to have a hard time reproducing on my test system - I've added a bunch of BUG_ON to the code to catch where I thought potential problems might be and have not tripped across any of them.  Can I get two pieces of data:
  a) which 9P CONFIG options are enabled in your kernel?
  b) what is the underlying file system on the server
  c) what version of qemu are you running

I'm going to try to spend all day today chasing this down, I may revert anyways if I can't reproduce since there's clearly degraded behavior -- I'd just feel better about it if I understood what the problem was.  In
your reverted version that doesn't experience the problem, was it just the last two patches that you reverted or the whole series?

   -eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: new 9p kasan splat in 6.9
  2024-04-10 11:43           ` Eric Van Hensbergen
@ 2024-04-10 17:02             ` Kent Overstreet
  2024-04-10 18:17               ` Eric Van Hensbergen
  0 siblings, 1 reply; 11+ messages in thread
From: Kent Overstreet @ 2024-04-10 17:02 UTC (permalink / raw
  To: Eric Van Hensbergen; +Cc: v9fs

On Wed, Apr 10, 2024 at 11:43:44AM +0000, Eric Van Hensbergen wrote:
> I'm having a hard time reproducing the 
> 
> April 1, 2024 at 8:27 PM, "Eric Van Hensbergen" <eric.vanhensbergen@linux.dev> wrote:
> > 
> > April 1, 2024 at 8:12 PM, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
> > > 
> > > Any other details you can give me from your environment (how is
> > >  9p mounted, is this qemu?, etc.)?
> > >  It's qemu, standard mount options:
> > > 
> > >  host /host 9p rw,relatime,access=client,trans=virtio 0 0
> > > 
> > 
> 
> I'm continuing to have a hard time reproducing on my test system -
> I've added a bunch of BUG_ON to the code to catch where I thought
> potential problems might be and have not tripped across any of them.
> Can I get two pieces of data:
>   a) which 9P CONFIG options are enabled in your kernel?

$ grep 9P ktest-out/kernel_build.x86_64/.config
CONFIG_NET_9P=y
CONFIG_NET_9P_FD=y
CONFIG_NET_9P_VIRTIO=y
# CONFIG_NET_9P_DEBUG is not set
CONFIG_9P_FS=y
# CONFIG_9P_FS_POSIX_ACL is not set
# CONFIG_9P_FS_SECURITY is not set

>   b) what is the underlying file system on the server

ext4

>   c) what version of qemu are you running

QEMU emulator version 8.2.1 (Debian 1:8.2.1+ds-1)

> I'm going to try to spend all day today chasing this down, I may
> revert anyways if I can't reproduce since there's clearly degraded
> behavior -- I'd just feel better about it if I understood what the
> problem was.  In your reverted version that doesn't experience the
> problem, was it just the last two patches that you reverted or the
> whole series?

Whole series. I've tested each revert individually, but unfortunately
the bug is so sporadic that none of them jumped out as the culprit (and
I'm running 1600 tests on each commit).

I don't have a way to tell my CI "run this test x number of times" or
I'd try to give you better information.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: new 9p kasan splat in 6.9
  2024-04-10 17:02             ` Kent Overstreet
@ 2024-04-10 18:17               ` Eric Van Hensbergen
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Van Hensbergen @ 2024-04-10 18:17 UTC (permalink / raw
  To: Kent Overstreet; +Cc: Eric Van Hensbergen, v9fs

Thanks.  That's useful, that is the same kernel config my regressions
run with (and helps me discount ACLs somewhat since they aren't
configured so I won't waste any more time on that rathole).  Oleg's
report (https://lore.kernel.org/v9fs/20240408141436.GA17022@redhat.com/)
seems to have a reproducible ELOOP (which looks like it was underlying
at least one of your reports as well) so I'm trying to reproduce that
now).

      -eric

On Wed, Apr 10, 2024 at 12:02 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Wed, Apr 10, 2024 at 11:43:44AM +0000, Eric Van Hensbergen wrote:
> > I'm having a hard time reproducing the
> >
> > April 1, 2024 at 8:27 PM, "Eric Van Hensbergen" <eric.vanhensbergen@linux.dev> wrote:
> > >
> > > April 1, 2024 at 8:12 PM, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
> > > >
> > > > Any other details you can give me from your environment (how is
> > > >  9p mounted, is this qemu?, etc.)?
> > > >  It's qemu, standard mount options:
> > > >
> > > >  host /host 9p rw,relatime,access=client,trans=virtio 0 0
> > > >
> > >
> >
> > I'm continuing to have a hard time reproducing on my test system -
> > I've added a bunch of BUG_ON to the code to catch where I thought
> > potential problems might be and have not tripped across any of them.
> > Can I get two pieces of data:
> >   a) which 9P CONFIG options are enabled in your kernel?
>
> $ grep 9P ktest-out/kernel_build.x86_64/.config
> CONFIG_NET_9P=y
> CONFIG_NET_9P_FD=y
> CONFIG_NET_9P_VIRTIO=y
> # CONFIG_NET_9P_DEBUG is not set
> CONFIG_9P_FS=y
> # CONFIG_9P_FS_POSIX_ACL is not set
> # CONFIG_9P_FS_SECURITY is not set
>
> >   b) what is the underlying file system on the server
>
> ext4
>
> >   c) what version of qemu are you running
>
> QEMU emulator version 8.2.1 (Debian 1:8.2.1+ds-1)
>
> > I'm going to try to spend all day today chasing this down, I may
> > revert anyways if I can't reproduce since there's clearly degraded
> > behavior -- I'd just feel better about it if I understood what the
> > problem was.  In your reverted version that doesn't experience the
> > problem, was it just the last two patches that you reverted or the
> > whole series?
>
> Whole series. I've tested each revert individually, but unfortunately
> the bug is so sporadic that none of them jumped out as the culprit (and
> I'm running 1600 tests on each commit).
>
> I don't have a way to tell my CI "run this test x number of times" or
> I'd try to give you better information.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: new 9p kasan splat in 6.9
  2024-04-02  0:07   ` Kent Overstreet
  2024-04-02  0:33     ` Eric Van Hensbergen
@ 2024-04-15 13:48     ` Eric Van Hensbergen
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Van Hensbergen @ 2024-04-15 13:48 UTC (permalink / raw
  To: Kent Overstreet; +Cc: v9fs

Kent - can you do me a favor and see if the current pending-fixes next
tree (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?h=pending-fixes)
solves your issues?  I'm still chasing some bugs when caches are
enabled, but your config is without caches and I think the single
patch revert should solve the problem for you.  If not, I'll revert
the other major patch from this series until I can track down whatever
race condition is causing the instability.  It would also be useful to
know the underlying file system configuration on your setup (is it
single device, which filesystem, etc.)

      -eric

On Mon, Apr 1, 2024 at 7:07 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Tue, Apr 02, 2024 at 12:02:43AM +0000, Eric Van Hensbergen wrote:
> > This should be fixed in -rc2.
>
> I'm still seeing sporadic weird fstests failures on rc2 though - note
> that I'm not testing 9p specifically, but using it for copying results
> to the host.
>
> Example:
> https://evilpiepirate.org/~testdashboard/c/cecfed9b446da5fba9d73e6448c9f0d1ff5d95ff/xfstests-nocow.generic.035/log.br
>
> I see runs where this causes the number of test failures to jump from
> the typical ~30 all the way up to 50; IOW, these happen all in a row
> when they happen.
>
> https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs-testing&commit=cecfed9b446da5fba9d73e6448c9f0d1ff5d95ff
>
> What's going on with 9p testing? It seems we've got multiple bugs that
> made it to rc1 that would have been caught with only moderate testing.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-04-15 13:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-31  5:33 new 9p kasan splat in 6.9 Kent Overstreet
2024-04-02  0:02 ` Eric Van Hensbergen
2024-04-02  0:07   ` Kent Overstreet
2024-04-02  0:33     ` Eric Van Hensbergen
2024-04-02  1:12       ` Kent Overstreet
2024-04-02  1:27         ` Eric Van Hensbergen
2024-04-02  1:34           ` Kent Overstreet
2024-04-10 11:43           ` Eric Van Hensbergen
2024-04-10 17:02             ` Kent Overstreet
2024-04-10 18:17               ` Eric Van Hensbergen
2024-04-15 13:48     ` Eric Van Hensbergen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.