All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Palash Oswal <oswalpalash@gmail.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	syzbot+11bf59db879676f59e52@syzkaller.appspotmail.com
Subject: Re: INFO: task hung in io_uring_cancel_sqpoll
Date: Sat, 15 May 2021 13:14:27 +0530	[thread overview]
Message-ID: <CAGyP=7cr1S462e+ZNQY_s3ygmHbYpRo6OLrx7RCLKX8h6F=OnQ@mail.gmail.com> (raw)
In-Reply-To: <7c993a83-392f-39d0-7a6f-c78f121f5ae2@gmail.com>

On Fri, May 14, 2021 at 3:01 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 5/13/21 10:28 PM, Pavel Begunkov wrote:
> > On 5/10/21 5:47 AM, Palash Oswal wrote:
> >> On Mon, May 3, 2021 at 4:15 PM Pavel Begunkov <asml.silence@gmail.com>
> >> wrote:
> >>
> >>> On 5/3/21 5:41 AM, Palash Oswal wrote:
> >>>> On Mon, May 3, 2021 at 3:42 AM Pavel Begunkov <asml.silence@gmail.com>
> >>> wrote:
> >>>>> The test might be very useful. Would you send a patch to
> >>>>> liburing? Or mind the repro being taken as a test?
> >>>>
> >>>> Pavel,
> >>>>
> >>>> I'm working on a PR for liburing with this test. Do you know how I can
> >>>
> >>> Perfect, thanks
> >>>
> >>>> address this behavior?
> >>>
> >>> As the hang is sqpoll cancellations, it's most probably
> >>> fixed in 5.13 + again awaits to be taken for stable.
> >>>
> >>> Don't know about segfaults, but it was so for long, and
> >>> syz reproducers are ofthen faults for me, and exit with 0
> >>> in the end. So, I wouldn't worry about it.
> >>>
> >>>
> >> Hey Pavel,
> >> The bug actually fails to reproduce on 5.12 when the fork() call is made by
> >> the runtests.sh script. This causes the program to end correctly, and the
> >> hang does not occur. I verified this on 5.12 where the bug isn't patched.
> >> Just running the `sqpoll-cancel-hang` triggers the bug; whereas running it
> >> after being forked from runtests.sh does not trigger the bug.
> >
> > I see. fyi, it's always good to wait for 5 minutes, because some useful
> > logs are not generated immediately but do timeout based hang detection.
> >
> > I'd think that may be due CLONE_IO and how to whom it binds workers,
> > but can you try first:
> >
> > 1) timeout -s INT -k $TIMEOUT $TIMEOUT ./sqpoll-cancel-hang
>
> edit:
>
> timeout -s INT -k 60 60 ./sqpoll-cancel-hang
>
> And privileged, root/sudo
>
> >
> > 2) remove timeout from <liburing>/tests/Makefile and run
> > "./runtests.sh sqpoll-cancel-hang" again looking for faults?
> >
> > diff --git a/test/runtests.sh b/test/runtests.sh
> > index e8f4ae5..2b51dca 100755
> > --- a/test/runtests.sh
> > +++ b/test/runtests.sh
> > @@ -91,7 +91,8 @@ run_test()
> >       fi
> >
> >       # Run the test
> > -     timeout -s INT -k $TIMEOUT $TIMEOUT ./$test_name $dev
> > +     # timeout -s INT -k $TIMEOUT $TIMEOUT ./$test_name $dev
> > +     ./$test_name $dev
> >       local status=$?
> >
> >       # Check test status

root@syzkaller:~/liburing/test# timeout -s INT -k 60 60 ./sqpoll-cancel-hang
[   19.381358] sqpoll-cancel-h[300]: segfault at 0 ip 0000556f7fa325e3
sp 00007ffee497d980 error 6 in s]
[   19.387323] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04
03 45 64 39 c6 48 0f 42 f0 45 31 c6
root@syzkaller:~/liburing/test# [  243.511620] INFO: task
iou-sqp-300:301 blocked for more than 120 sec.
[  243.514146]       Not tainted 5.12.0 #142
[  243.515301] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  243.517629] task:iou-sqp-300     state:D stack:    0 pid:  301
ppid:     1 flags:0x00004004
[  243.520155] Call Trace:
[  243.520956]  __schedule+0xb1d/0x1130
[  243.522102]  ? __sched_text_start+0x8/0x8
[  243.523195]  ? io_wq_worker_sleeping+0x145/0x500
[  243.524588]  schedule+0x131/0x1c0
[  243.525892]  io_uring_cancel_sqpoll+0x288/0x350
[  243.527610]  ? io_sq_thread_unpark+0xd0/0xd0
[  243.529084]  ? mutex_lock+0xbb/0x130
[  243.530327]  ? init_wait_entry+0xe0/0xe0
[  243.532805]  ? wait_for_completion_killable_timeout+0x20/0x20
[  243.535411]  io_sq_thread+0x174c/0x18c0
[  243.536520]  ? io_rsrc_put_work+0x380/0x380
[  243.537904]  ? init_wait_entry+0xe0/0xe0
[  243.538935]  ? _raw_spin_lock_irq+0xa5/0x180
[  243.540203]  ? _raw_spin_lock_irqsave+0x190/0x190
[  243.542398]  ? calculate_sigpending+0x6b/0xa0
[  243.543868]  ? io_rsrc_put_work+0x380/0x380
[  243.545377]  ret_from_fork+0x22/0x30
^C
root@syzkaller:~/liburing/test# ps
  PID TTY          TIME CMD
  269 ttyS0    00:00:00 login
  294 ttyS0    00:00:00 bash
  300 ttyS0    00:00:00 sqpoll-cancel-h
  305 ttyS0    00:00:00 ps


After reboot, and the runtests.sh diff applied ( to remove timeout )
root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang
Running test sqp[   45.332140] Running test sqpoll-cancel-hang:
oll-cancel-hang:
[   45.352524] sqpoll-cancel-h[314]: segfault at 0 ip 000056025bd085e3
sp 00007fffb08e20b0 error 6 in s]
[   45.356601] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04
03 45 64 39 c6 48 0f 42 f0 45 31 c6
[  243.019384] INFO: task iou-sqp-314:315 blocked for more than 120 seconds.
[  243.021483]       Not tainted 5.12.0 #142
[  243.022633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  243.024651] task:iou-sqp-314     state:D stack:    0 pid:  315
ppid:   313 flags:0x00004004
[  243.026822] Call Trace:
[  243.027478]  __schedule+0xb1d/0x1130
[  243.028382]  ? __sched_text_start+0x8/0x8
[  243.029536]  ? io_wq_worker_sleeping+0x145/0x500
[  243.030932]  schedule+0x131/0x1c0
[  243.031920]  io_uring_cancel_sqpoll+0x288/0x350
[  243.033393]  ? io_sq_thread_unpark+0xd0/0xd0
[  243.034713]  ? mutex_lock+0xbb/0x130
[  243.035775]  ? init_wait_entry+0xe0/0xe0
[  243.037036]  ? wait_for_completion_killable_timeout+0x20/0x20
[  243.039492]  io_sq_thread+0x174c/0x18c0
[  243.040894]  ? io_rsrc_put_work+0x380/0x380
[  243.042463]  ? init_wait_entry+0xe0/0xe0
[  243.043990]  ? _raw_spin_lock_irq+0xa5/0x180
[  243.045581]  ? _raw_spin_lock_irqsave+0x190/0x190
[  243.047545]  ? calculate_sigpending+0x6b/0xa0
[  243.049262]  ? io_rsrc_put_work+0x380/0x380
[  243.050861]  ret_from_fork+0x22/0x30
^C
root@syzkaller:~/liburing/test# ps
  PID TTY          TIME CMD
  285 ttyS0    00:00:00 login
  300 ttyS0    00:00:00 bash
  314 ttyS0    00:00:00 sqpoll-cancel-h
  318 ttyS0    00:00:00 ps


runtests.sh without any changes
root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang
[   49.634886] Running test sqpoll-cancel-hang:
Running test sqpoll-cancel-hang:
[   49.658365] sqpoll-cancel-h[302]: segfault at 0 ip 000055a76e99c5e3
sp 00007ffdc255d1a0 error 6 in s]
[   49.661703] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04
03 45 64 39 c6 48 0f 42 f0 45 31 c6
Test sqpoll-cancel-hang timed out (may not be a failure)
All tests passed
root@syzkaller:~/liburing/test# ps
  PID TTY          TIME CMD
  269 ttyS0    00:00:00 login
  287 ttyS0    00:00:00 bash
  302 ttyS0    00:00:00 sqpoll-cancel-h
  309 ttyS0    00:00:00 ps
root@syzkaller:~/liburing/test# [  243.324831] INFO: task
iou-sqp-302:303 blocked for more than 120 sec.
[  243.328320]       Not tainted 5.12.0 #142
[  243.330361] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  243.333930] task:iou-sqp-302     state:D stack:    0 pid:  303
ppid:     1 flags:0x00004004
[  243.337758] Call Trace:
[  243.338926]  __schedule+0xb1d/0x1130
[  243.340801]  ? __sched_text_start+0x8/0x8
[  243.342690]  ? io_wq_worker_sleeping+0x145/0x500
[  243.344903]  schedule+0x131/0x1c0
[  243.346626]  io_uring_cancel_sqpoll+0x288/0x350
[  243.348762]  ? io_sq_thread_unpark+0xd0/0xd0
[  243.351036]  ? mutex_lock+0xbb/0x130
[  243.352737]  ? init_wait_entry+0xe0/0xe0
[  243.354673]  ? wait_for_completion_killable_timeout+0x20/0x20
[  243.356989]  io_sq_thread+0x174c/0x18c0
[  243.358559]  ? io_rsrc_put_work+0x380/0x380
[  243.359981]  ? init_wait_entry+0xe0/0xe0
[  243.361185]  ? _raw_spin_lock_irq+0x110/0x180
[  243.362958]  ? _raw_spin_lock_irqsave+0x190/0x190
[  243.364260]  ? calculate_sigpending+0x6b/0xa0
[  243.365763]  ? io_rsrc_put_work+0x380/0x380
[  243.367041]  ret_from_fork+0x22/0x30




> >
> >
> >>
> >> The segfaults are benign, but notice the "All tests passed" in the previous
> >> mail. It should not have passed, as the run was on 5.12. Therefore I wanted
> >> to ask your input on how to address this odd behaviour, where the
> >> involvement of runtests.sh actually mitigated the bug.
> >>
> >>
> >>
> >>>> root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang
> >>>> Running test sqp[   15.310997] Running test sqpoll-cancel-hang:
> >>>> oll-cancel-hang:
> >>>> [   15.333348] sqpoll-cancel-h[305]: segfault at 0 ip 000055ad00e265e3
> >>> sp]
> >>>> [   15.334940] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03
> >>> 46
> >>>> All tests passed
> >>>>
> >>>> root@syzkaller:~/liburing/test# ./sqpoll-cancel-hang
> >>>> [   13.572639] sqpoll-cancel-h[298]: segfault at 0 ip 00005634c4a455e3
> >>> sp]
> >>>> [   13.576506] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03
> >>> 46
> >>>> [   23.350459] random: crng init done
> >>>> [   23.352837] random: 7 urandom warning(s) missed due to ratelimiting
> >>>> [  243.090865] INFO: task iou-sqp-298:299 blocked for more than 120
> >>> secon.
> >>>> [  243.095187]       Not tainted 5.12.0 #142
> >>>> [  243.099800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disable.
> >>>> [  243.105928] task:iou-sqp-298     state:D stack:    0 pid:  299 ppid:
> >>> 4
> >>>> [  243.111044] Call Trace:
> >>>> [  243.112855]  __schedule+0xb1d/0x1130
> >>>> [  243.115549]  ? __sched_text_start+0x8/0x8
> >>>> [  243.118328]  ? io_wq_worker_sleeping+0x145/0x500
> >>>> [  243.121790]  schedule+0x131/0x1c0
> >>>> [  243.123698]  io_uring_cancel_sqpoll+0x288/0x350
> >>>> [  243.125977]  ? io_sq_thread_unpark+0xd0/0xd0
> >>>> [  243.128966]  ? mutex_lock+0xbb/0x130
> >>>> [  243.132572]  ? init_wait_entry+0xe0/0xe0
> >>>> [  243.135429]  ? wait_for_completion_killable_timeout+0x20/0x20
> >>>> [  243.138303]  io_sq_thread+0x174c/0x18c0
> >>>> [  243.140162]  ? io_rsrc_put_work+0x380/0x380
> >>>> [  243.141613]  ? init_wait_entry+0xe0/0xe0
> >>>> [  243.143686]  ? _raw_spin_lock_irq+0xa5/0x180
> >>>> [  243.145619]  ? _raw_spin_lock_irqsave+0x190/0x190
> >>>> [  243.147671]  ? calculate_sigpending+0x6b/0xa0
> >>>> [  243.149036]  ? io_rsrc_put_work+0x380/0x380
> >>>> [  243.150694]  ret_from_fork+0x22/0x30
> >>>>
> >>>> Palash
> >>>>
> >>>
> >>> --
> >>> Pavel Begunkov
> >>>
> >>
> >
>
> --
> Pavel Begunkov

  reply	other threads:[~2021-05-15  7:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0000000000000c97e505bdd1d60e@google.com>
     [not found] ` <cfa0f9b8-91ec-4772-a6c2-c5206f32373fn@googlegroups.com>
     [not found]   ` <53a22ab4-7a2d-4ebd-802d-9d1b4ce4e087n@googlegroups.com>
2021-04-30 14:21     ` INFO: task hung in io_uring_cancel_sqpoll Palash Oswal
2021-04-30 14:33       ` Pavel Begunkov
2021-04-30 15:02         ` Palash Oswal
2021-04-30 15:07           ` Pavel Begunkov
2021-04-30 15:11             ` Palash Oswal
2021-04-30 18:34           ` Pavel Begunkov
2021-04-30 18:34             ` syzbot
2021-04-30 21:04             ` Pavel Begunkov
2021-05-01  4:51               ` Palash Oswal
2021-05-01  7:39                 ` Pavel Begunkov
2021-05-02 10:33           ` Pavel Begunkov
2021-05-02 15:01             ` Palash Oswal
2021-05-02 15:56               ` Pavel Begunkov
2021-05-02 22:12       ` Pavel Begunkov
2021-05-03  4:41         ` Palash Oswal
2021-05-03 10:45           ` Pavel Begunkov
     [not found]             ` <CAGyP=7eoSfh7z638PnP5UF4xVKcrG1jB_qmFo6uPZ7iWfu_2sQ@mail.gmail.com>
2021-05-13 21:28               ` Pavel Begunkov
2021-05-13 21:31                 ` Pavel Begunkov
2021-05-15  7:44                   ` Palash Oswal [this message]
2021-06-12 15:48 ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGyP=7cr1S462e+ZNQY_s3ygmHbYpRo6OLrx7RCLKX8h6F=OnQ@mail.gmail.com' \
    --to=oswalpalash@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzbot+11bf59db879676f59e52@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.