All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Palash Oswal <oswalpalash@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	syzbot+11bf59db879676f59e52@syzkaller.appspotmail.com
Subject: Re: INFO: task hung in io_uring_cancel_sqpoll
Date: Thu, 13 May 2021 22:31:09 +0100	[thread overview]
Message-ID: <7c993a83-392f-39d0-7a6f-c78f121f5ae2@gmail.com> (raw)
In-Reply-To: <4127fb94-89d2-4e36-8835-514118cb1cce@gmail.com>

On 5/13/21 10:28 PM, Pavel Begunkov wrote:
> On 5/10/21 5:47 AM, Palash Oswal wrote:
>> On Mon, May 3, 2021 at 4:15 PM Pavel Begunkov <asml.silence@gmail.com>
>> wrote:
>>
>>> On 5/3/21 5:41 AM, Palash Oswal wrote:
>>>> On Mon, May 3, 2021 at 3:42 AM Pavel Begunkov <asml.silence@gmail.com>
>>> wrote:
>>>>> The test might be very useful. Would you send a patch to
>>>>> liburing? Or mind the repro being taken as a test?
>>>>
>>>> Pavel,
>>>>
>>>> I'm working on a PR for liburing with this test. Do you know how I can
>>>
>>> Perfect, thanks
>>>
>>>> address this behavior?
>>>
>>> As the hang is sqpoll cancellations, it's most probably
>>> fixed in 5.13 + again awaits to be taken for stable.
>>>
>>> Don't know about segfaults, but it was so for long, and
>>> syz reproducers are ofthen faults for me, and exit with 0
>>> in the end. So, I wouldn't worry about it.
>>>
>>>
>> Hey Pavel,
>> The bug actually fails to reproduce on 5.12 when the fork() call is made by
>> the runtests.sh script. This causes the program to end correctly, and the
>> hang does not occur. I verified this on 5.12 where the bug isn't patched.
>> Just running the `sqpoll-cancel-hang` triggers the bug; whereas running it
>> after being forked from runtests.sh does not trigger the bug.
> 
> I see. fyi, it's always good to wait for 5 minutes, because some useful
> logs are not generated immediately but do timeout based hang detection.
> 
> I'd think that may be due CLONE_IO and how to whom it binds workers,
> but can you try first:
> 
> 1) timeout -s INT -k $TIMEOUT $TIMEOUT ./sqpoll-cancel-hang

edit:

timeout -s INT -k 60 60 ./sqpoll-cancel-hang

And privileged, root/sudo

> 
> 2) remove timeout from <liburing>/tests/Makefile and run
> "./runtests.sh sqpoll-cancel-hang" again looking for faults?
> 
> diff --git a/test/runtests.sh b/test/runtests.sh
> index e8f4ae5..2b51dca 100755
> --- a/test/runtests.sh
> +++ b/test/runtests.sh
> @@ -91,7 +91,8 @@ run_test()
>  	fi
>  
>  	# Run the test
> -	timeout -s INT -k $TIMEOUT $TIMEOUT ./$test_name $dev
> +	# timeout -s INT -k $TIMEOUT $TIMEOUT ./$test_name $dev
> +	./$test_name $dev
>  	local status=$?
>  
>  	# Check test status
> 
> 
>>
>> The segfaults are benign, but notice the "All tests passed" in the previous
>> mail. It should not have passed, as the run was on 5.12. Therefore I wanted
>> to ask your input on how to address this odd behaviour, where the
>> involvement of runtests.sh actually mitigated the bug.
>>
>>
>>
>>>> root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang
>>>> Running test sqp[   15.310997] Running test sqpoll-cancel-hang:
>>>> oll-cancel-hang:
>>>> [   15.333348] sqpoll-cancel-h[305]: segfault at 0 ip 000055ad00e265e3
>>> sp]
>>>> [   15.334940] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03
>>> 46
>>>> All tests passed
>>>>
>>>> root@syzkaller:~/liburing/test# ./sqpoll-cancel-hang
>>>> [   13.572639] sqpoll-cancel-h[298]: segfault at 0 ip 00005634c4a455e3
>>> sp]
>>>> [   13.576506] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03
>>> 46
>>>> [   23.350459] random: crng init done
>>>> [   23.352837] random: 7 urandom warning(s) missed due to ratelimiting
>>>> [  243.090865] INFO: task iou-sqp-298:299 blocked for more than 120
>>> secon.
>>>> [  243.095187]       Not tainted 5.12.0 #142
>>>> [  243.099800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disable.
>>>> [  243.105928] task:iou-sqp-298     state:D stack:    0 pid:  299 ppid:
>>> 4
>>>> [  243.111044] Call Trace:
>>>> [  243.112855]  __schedule+0xb1d/0x1130
>>>> [  243.115549]  ? __sched_text_start+0x8/0x8
>>>> [  243.118328]  ? io_wq_worker_sleeping+0x145/0x500
>>>> [  243.121790]  schedule+0x131/0x1c0
>>>> [  243.123698]  io_uring_cancel_sqpoll+0x288/0x350
>>>> [  243.125977]  ? io_sq_thread_unpark+0xd0/0xd0
>>>> [  243.128966]  ? mutex_lock+0xbb/0x130
>>>> [  243.132572]  ? init_wait_entry+0xe0/0xe0
>>>> [  243.135429]  ? wait_for_completion_killable_timeout+0x20/0x20
>>>> [  243.138303]  io_sq_thread+0x174c/0x18c0
>>>> [  243.140162]  ? io_rsrc_put_work+0x380/0x380
>>>> [  243.141613]  ? init_wait_entry+0xe0/0xe0
>>>> [  243.143686]  ? _raw_spin_lock_irq+0xa5/0x180
>>>> [  243.145619]  ? _raw_spin_lock_irqsave+0x190/0x190
>>>> [  243.147671]  ? calculate_sigpending+0x6b/0xa0
>>>> [  243.149036]  ? io_rsrc_put_work+0x380/0x380
>>>> [  243.150694]  ret_from_fork+0x22/0x30
>>>>
>>>> Palash
>>>>
>>>
>>> --
>>> Pavel Begunkov
>>>
>>
> 

-- 
Pavel Begunkov

  reply	other threads:[~2021-05-13 21:31 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0000000000000c97e505bdd1d60e@google.com>
     [not found] ` <cfa0f9b8-91ec-4772-a6c2-c5206f32373fn@googlegroups.com>
     [not found]   ` <53a22ab4-7a2d-4ebd-802d-9d1b4ce4e087n@googlegroups.com>
2021-04-30 14:21     ` INFO: task hung in io_uring_cancel_sqpoll Palash Oswal
2021-04-30 14:33       ` Pavel Begunkov
2021-04-30 15:02         ` Palash Oswal
2021-04-30 15:07           ` Pavel Begunkov
2021-04-30 15:11             ` Palash Oswal
2021-04-30 18:34           ` Pavel Begunkov
2021-04-30 18:34             ` syzbot
2021-04-30 21:04             ` Pavel Begunkov
2021-05-01  4:51               ` Palash Oswal
2021-05-01  7:39                 ` Pavel Begunkov
2021-05-02 10:33           ` Pavel Begunkov
2021-05-02 15:01             ` Palash Oswal
2021-05-02 15:56               ` Pavel Begunkov
2021-05-02 22:12       ` Pavel Begunkov
2021-05-03  4:41         ` Palash Oswal
2021-05-03 10:45           ` Pavel Begunkov
     [not found]             ` <CAGyP=7eoSfh7z638PnP5UF4xVKcrG1jB_qmFo6uPZ7iWfu_2sQ@mail.gmail.com>
2021-05-13 21:28               ` Pavel Begunkov
2021-05-13 21:31                 ` Pavel Begunkov [this message]
2021-05-15  7:44                   ` Palash Oswal
2021-06-12 15:48 ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c993a83-392f-39d0-7a6f-c78f121f5ae2@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oswalpalash@gmail.com \
    --cc=syzbot+11bf59db879676f59e52@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.