IO-Uring Archive mirror
 help / color / mirror / Atom feed
* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
       [not found] <313824bc-799d-414f-96b7-e6de57c7e21d@gmail.com>
@ 2024-05-21 16:16 ` Jens Axboe
  2024-05-21 16:22   ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2024-05-21 16:16 UTC (permalink / raw
  To: Andrew Udvare, regressions; +Cc: io-uring

(removing stable as this isn't a stable regression, and the commit
itself isn't marked for stable to begin with).

On 5/21/24 10:02 AM, Andrew Udvare wrote:
> #regzbot introduced: v6.8..v6.9-rc1
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=af5d68f8892f8ee8f137648b79ceb2abc153a19b
> 
> Since the above commit present in 6.9+, Node running a Yarn installation that executes a subprocess always shows the following:
> 
> /test # yarn --offline install
> yarn install v1.22.22
> warning package.json: "test" is also the name of a node core module
> warning test@1.0.0: "test" is also the name of a node core module
> [1/4] Resolving packages...
> [2/4] Fetching packages...
> [3/4] Linking dependencies...
> [4/4] Building fresh packages...
> error /test/node_modules/snyk: Command failed.
> Exit code: 126
> Command: node wrapper_dist/bootstrap.js exec
> Arguments:
> Directory: /test/node_modules/snyk
> Output:
> /bin/sh: node: Text file busy
> 
> The commit was found by bisection with a simple initramfs that just runs 'yarn --offline install' with a test project and cached Yarn packages.
> 
> To reproduce:
> 
> npm install -g yarn
> mkdir test
> cd test
> cat > package.json <<EOF
> {
>    "name": "test",
>    "version": "1.0.0",
>    "main": "index.js",
>    "license": "MIT",
>    "dependencies": {
>      "snyk": "^1.1291.0"
>    }
> }
> EOF
> yarn install
> 
> Modern Yarn will give the same result but with slightly different output.
> 
> This also appears to affect node-gyp: https://github.com/nodejs/node/issues/53051
> 
> See also: https://bugs.gentoo.org/931942

This looks like a timing alteration due to task_work being done
differently, from a quick look and guess. For some reason SQPOLL is
being used. I tried running it here, but it doesn't reproduce for me.
Tried both current -git and 6.9 as released. I'll try on x86 as well to
see if I can hit it.

Maybe someone can describe what is happening here? I'm assuming that
something is racing with an io_uring operation done by SQPOLL, which is
keeping the file open until done.

This may or may not be a kernel issue, depending on what assumptions are
being made on when operations are begun and completed. Maybe those
assumptions are correct and there is indeed a kernel regression here, or
maybe the user side is just buggy in its assumptions.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 16:16 ` [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f Jens Axboe
@ 2024-05-21 16:22   ` Jens Axboe
  2024-05-21 18:17     ` Christian Heusel
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2024-05-21 16:22 UTC (permalink / raw
  To: Andrew Udvare, regressions; +Cc: io-uring

On 5/21/24 10:16 AM, Jens Axboe wrote:
> (removing stable as this isn't a stable regression, and the commit
> itself isn't marked for stable to begin with).
> 
> On 5/21/24 10:02 AM, Andrew Udvare wrote:
>> #regzbot introduced: v6.8..v6.9-rc1
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=af5d68f8892f8ee8f137648b79ceb2abc153a19b
>>
>> Since the above commit present in 6.9+, Node running a Yarn installation that executes a subprocess always shows the following:
>>
>> /test # yarn --offline install
>> yarn install v1.22.22
>> warning package.json: "test" is also the name of a node core module
>> warning test@1.0.0: "test" is also the name of a node core module
>> [1/4] Resolving packages...
>> [2/4] Fetching packages...
>> [3/4] Linking dependencies...
>> [4/4] Building fresh packages...
>> error /test/node_modules/snyk: Command failed.
>> Exit code: 126
>> Command: node wrapper_dist/bootstrap.js exec
>> Arguments:
>> Directory: /test/node_modules/snyk
>> Output:
>> /bin/sh: node: Text file busy
>>
>> The commit was found by bisection with a simple initramfs that just runs 'yarn --offline install' with a test project and cached Yarn packages.
>>
>> To reproduce:
>>
>> npm install -g yarn
>> mkdir test
>> cd test
>> cat > package.json <<EOF
>> {
>>    "name": "test",
>>    "version": "1.0.0",
>>    "main": "index.js",
>>    "license": "MIT",
>>    "dependencies": {
>>      "snyk": "^1.1291.0"
>>    }
>> }
>> EOF
>> yarn install
>>
>> Modern Yarn will give the same result but with slightly different output.
>>
>> This also appears to affect node-gyp: https://github.com/nodejs/node/issues/53051
>>
>> See also: https://bugs.gentoo.org/931942
> 
> This looks like a timing alteration due to task_work being done
> differently, from a quick look and guess. For some reason SQPOLL is
> being used. I tried running it here, but it doesn't reproduce for me.
> Tried both current -git and 6.9 as released. I'll try on x86 as well to
> see if I can hit it.

Reproduces on an x86-64 vm:

[root@archlinux test]# yarn install 
yarn install v1.22.22
warning package.json: "test" is also the name of a node core module
info No lockfile found.
warning test@1.0.0: "test" is also the name of a node core module
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
error /root/test/node_modules/snyk: Command failed.
Exit code: 126
Command: node wrapper_dist/bootstrap.js exec
Arguments: 
Directory: /root/test/node_modules/snyk
Output:
/bin/sh: /tmp/yarn--1716308447842-0.8496252120161716/node: /bin/sh: bad interpreter: Text file busy

I'll poke a bit.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 16:22   ` Jens Axboe
@ 2024-05-21 18:17     ` Christian Heusel
  2024-05-21 18:25       ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Heusel @ 2024-05-21 18:17 UTC (permalink / raw
  To: Jens Axboe; +Cc: Andrew Udvare, regressions, io-uring

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

On 24/05/21 10:22AM, Jens Axboe wrote:
> > On 5/21/24 10:02 AM, Andrew Udvare wrote:
> >> #regzbot introduced: v6.8..v6.9-rc1
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=af5d68f8892f8ee8f137648b79ceb2abc153a19b
> >>
> >> Since the above commit present in 6.9+, Node running a Yarn installation that executes a subprocess always shows the following:
> >>
> >> This also appears to affect node-gyp: https://github.com/nodejs/node/issues/53051
> >>
> >> See also: https://bugs.gentoo.org/931942
> > 
> > This looks like a timing alteration due to task_work being done
> > differently, from a quick look and guess. For some reason SQPOLL is
> > being used. I tried running it here, but it doesn't reproduce for me.
> > Tried both current -git and 6.9 as released. I'll try on x86 as well to
> > see if I can hit it.
> 

It seems like this also was a problem for libuv previously as somone
noted in a comment on the Arch Linux Bugtracker[0]:

- https://github.com/libuv/libuv/commit/1752791c9ea89dbf54e2a20a9d9f899119a2d179
- https://github.com/libuv/libuv/blob/v1.48.0/src/unix/linux.c#L834

Cheers,
Chris

[0]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/55#note_186921

#regzbot introduced: af5d68f8892f8ee8f137648b79ceb2abc153a19b
#regzbot link: https://bugs.gentoo.org/931942
#regzbot link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/55


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 18:17     ` Christian Heusel
@ 2024-05-21 18:25       ` Jens Axboe
  2024-05-21 18:29         ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2024-05-21 18:25 UTC (permalink / raw
  To: Christian Heusel; +Cc: Andrew Udvare, regressions, io-uring

On 5/21/24 12:17 PM, Christian Heusel wrote:
> On 24/05/21 10:22AM, Jens Axboe wrote:
>>> On 5/21/24 10:02 AM, Andrew Udvare wrote:
>>>> #regzbot introduced: v6.8..v6.9-rc1
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=af5d68f8892f8ee8f137648b79ceb2abc153a19b
>>>>
>>>> Since the above commit present in 6.9+, Node running a Yarn installation that executes a subprocess always shows the following:
>>>>
>>>> This also appears to affect node-gyp: https://github.com/nodejs/node/issues/53051
>>>>
>>>> See also: https://bugs.gentoo.org/931942
>>>
>>> This looks like a timing alteration due to task_work being done
>>> differently, from a quick look and guess. For some reason SQPOLL is
>>> being used. I tried running it here, but it doesn't reproduce for me.
>>> Tried both current -git and 6.9 as released. I'll try on x86 as well to
>>> see if I can hit it.
>>
> 
> It seems like this also was a problem for libuv previously as somone
> noted in a comment on the Arch Linux Bugtracker[0]:
> 
> - https://github.com/libuv/libuv/commit/1752791c9ea89dbf54e2a20a9d9f899119a2d179
> - https://github.com/libuv/libuv/blob/v1.48.0/src/unix/linux.c#L834

Just got that far in my local poking, it is indeed execve() getting
ETXTBUSY. I'll see if I can make a test case for this. Presumably the
application side is fine, eg it waits on the close completion before
doing the execve, not just having it be issued. Outside of that, only
other thing I can think of is that the final close would be punted to
task_work by fput(), which means there's also a dependency on the task
having run its kernel task_work before it's fully closed.

I'll be back with more info...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 18:25       ` Jens Axboe
@ 2024-05-21 18:29         ` Jens Axboe
  2024-05-21 19:05           ` Jens Axboe
                             ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Jens Axboe @ 2024-05-21 18:29 UTC (permalink / raw
  To: Christian Heusel; +Cc: Andrew Udvare, regressions, io-uring

On 5/21/24 12:25 PM, Jens Axboe wrote:
> Outside of that, only other thing I can think of is that the final
> close would be punted to task_work by fput(), which means there's also
> a dependency on the task having run its kernel task_work before it's
> fully closed.

Yep I think that's it, the below should fix it.


diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index 554c7212aa46..68a3e3290411 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -241,6 +241,8 @@ static unsigned int io_sq_tw(struct llist_node **retry_list, int max_entries)
 			return count;
 		max_entries -= count;
 	}
+	if (task_work_pending(current))
+		task_work_run();
 
 	*retry_list = tctx_task_work_run(tctx, max_entries, &count);
 	return count;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 18:29         ` Jens Axboe
@ 2024-05-21 19:05           ` Jens Axboe
  2024-05-21 19:07           ` Christian Heusel
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2024-05-21 19:05 UTC (permalink / raw
  To: Christian Heusel; +Cc: Andrew Udvare, regressions, io-uring

On 5/21/24 12:29 PM, Jens Axboe wrote:
> On 5/21/24 12:25 PM, Jens Axboe wrote:
>> Outside of that, only other thing I can think of is that the final
>> close would be punted to task_work by fput(), which means there's also
>> a dependency on the task having run its kernel task_work before it's
>> fully closed.
> 
> Yep I think that's it, the below should fix it.

Sent it out and also wrote a test case to catch this:

https://git.kernel.dk/cgit/liburing/commit/?id=06c22ef6637284ab1f31ee64f1ee48a829958816

just in case we ever regress in that manner again.

This will go into the 6.10-rc1 release, will ship it to Linus in a day or
two. And then it'll land in the next stable release of 6.9 as well.

Thanks for the report!

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 18:29         ` Jens Axboe
  2024-05-21 19:05           ` Jens Axboe
@ 2024-05-21 19:07           ` Christian Heusel
  2024-05-21 19:11             ` Jens Axboe
  2024-05-21 19:30           ` Andrew Udvare
  2024-05-21 19:33           ` Andrew Udvare
  3 siblings, 1 reply; 12+ messages in thread
From: Christian Heusel @ 2024-05-21 19:07 UTC (permalink / raw
  To: Jens Axboe; +Cc: Andrew Udvare, regressions, io-uring

[-- Attachment #1: Type: text/plain, Size: 537 bytes --]

On 24/05/21 12:29PM, Jens Axboe wrote:
> On 5/21/24 12:25 PM, Jens Axboe wrote:
> > Outside of that, only other thing I can think of is that the final
> > close would be punted to task_work by fput(), which means there's also
> > a dependency on the task having run its kernel task_work before it's
> > fully closed.
> 
> Yep I think that's it, the below should fix it.

It indeed does! I applied the patch on top of Linus tree and the issue
is now gone!

Tested-by: Christian Heusel <christian@heusel.eu>

Cheers,
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 19:07           ` Christian Heusel
@ 2024-05-21 19:11             ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2024-05-21 19:11 UTC (permalink / raw
  To: Christian Heusel; +Cc: Andrew Udvare, regressions, io-uring

On 5/21/24 1:07 PM, Christian Heusel wrote:
> On 24/05/21 12:29PM, Jens Axboe wrote:
>> On 5/21/24 12:25 PM, Jens Axboe wrote:
>>> Outside of that, only other thing I can think of is that the final
>>> close would be punted to task_work by fput(), which means there's also
>>> a dependency on the task having run its kernel task_work before it's
>>> fully closed.
>>
>> Yep I think that's it, the below should fix it.
> 
> It indeed does! I applied the patch on top of Linus tree and the issue
> is now gone!
> 
> Tested-by: Christian Heusel <christian@heusel.eu>

Thanks for testing! I'll add your tag.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 18:29         ` Jens Axboe
  2024-05-21 19:05           ` Jens Axboe
  2024-05-21 19:07           ` Christian Heusel
@ 2024-05-21 19:30           ` Andrew Udvare
  2024-05-21 19:32             ` Jens Axboe
  2024-05-21 19:33           ` Andrew Udvare
  3 siblings, 1 reply; 12+ messages in thread
From: Andrew Udvare @ 2024-05-21 19:30 UTC (permalink / raw
  To: Jens Axboe, Christian Heusel; +Cc: regressions, io-uring


[-- Attachment #1.1: Type: text/plain, Size: 1374 bytes --]

On 21/05/2024 14:29, Jens Axboe wrote:
> On 5/21/24 12:25 PM, Jens Axboe wrote:
>> Outside of that, only other thing I can think of is that the final
>> close would be punted to task_work by fput(), which means there's also
>> a dependency on the task having run its kernel task_work before it's
>> fully closed.
> 
> Yep I think that's it, the below should fix it.
> 
> 
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index 554c7212aa46..68a3e3290411 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -241,6 +241,8 @@ static unsigned int io_sq_tw(struct llist_node **retry_list, int max_entries)
>   			return count;
>   		max_entries -= count;
>   	}
> +	if (task_work_pending(current))
> +		task_work_run();
>   
>   	*retry_list = tctx_task_work_run(tctx, max_entries, &count);
>   	return count;
> 

This patch works for me on 6.9.1.

  $ yarn
yarn install v1.22.22
warning package.json: "test" is also the name of a node core module
info No lockfile found.
warning test@1.0.0: "test" is also the name of a node core module
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
success Saved lockfile.
Done in 3.32s.

  $ uname -a
Linux limelight 6.9.1-gentoo-limelight #2 SMP PREEMPT_DYNAMIC TKG Tue 
May 21 15:21:33 EDT 2024

--
Andrew

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 19:30           ` Andrew Udvare
@ 2024-05-21 19:32             ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2024-05-21 19:32 UTC (permalink / raw
  To: Andrew Udvare, Christian Heusel; +Cc: regressions, io-uring

On 5/21/24 1:30 PM, Andrew Udvare wrote:
> On 21/05/2024 14:29, Jens Axboe wrote:
>> On 5/21/24 12:25 PM, Jens Axboe wrote:
>>> Outside of that, only other thing I can think of is that the final
>>> close would be punted to task_work by fput(), which means there's also
>>> a dependency on the task having run its kernel task_work before it's
>>> fully closed.
>>
>> Yep I think that's it, the below should fix it.
>>
>>
>> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
>> index 554c7212aa46..68a3e3290411 100644
>> --- a/io_uring/sqpoll.c
>> +++ b/io_uring/sqpoll.c
>> @@ -241,6 +241,8 @@ static unsigned int io_sq_tw(struct llist_node **retry_list, int max_entries)
>>               return count;
>>           max_entries -= count;
>>       }
>> +    if (task_work_pending(current))
>> +        task_work_run();
>>         *retry_list = tctx_task_work_run(tctx, max_entries, &count);
>>       return count;
>>
> 
> This patch works for me on 6.9.1.
> 
>  $ yarn
> yarn install v1.22.22
> warning package.json: "test" is also the name of a node core module
> info No lockfile found.
> warning test@1.0.0: "test" is also the name of a node core module
> [1/4] Resolving packages...
> [2/4] Fetching packages...
> [3/4] Linking dependencies...
> [4/4] Building fresh packages...
> success Saved lockfile.
> Done in 3.32s.
> 
>  $ uname -a
> Linux limelight 6.9.1-gentoo-limelight #2 SMP PREEMPT_DYNAMIC TKG Tue May 21 15:21:33 EDT 2024

Great, thank you for testing!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 18:29         ` Jens Axboe
                             ` (2 preceding siblings ...)
  2024-05-21 19:30           ` Andrew Udvare
@ 2024-05-21 19:33           ` Andrew Udvare
  2024-05-21 19:36             ` Jens Axboe
  3 siblings, 1 reply; 12+ messages in thread
From: Andrew Udvare @ 2024-05-21 19:33 UTC (permalink / raw
  To: Jens Axboe, Christian Heusel; +Cc: regressions, io-uring


[-- Attachment #1.1: Type: text/plain, Size: 422 bytes --]

On 21/05/2024 14:29, Jens Axboe wrote:
> On 5/21/24 12:25 PM, Jens Axboe wrote:
>> Outside of that, only other thing I can think of is that the final
>> close would be punted to task_work by fput(), which means there's also
>> a dependency on the task having run its kernel task_work before it's
>> fully closed.
> 
> Yep I think that's it, the below should fix it.

Tested-by: Andrew Udvare <audvare@gmail.com>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f
  2024-05-21 19:33           ` Andrew Udvare
@ 2024-05-21 19:36             ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2024-05-21 19:36 UTC (permalink / raw
  To: Andrew Udvare, Christian Heusel; +Cc: regressions, io-uring

On 5/21/24 1:33 PM, Andrew Udvare wrote:
> On 21/05/2024 14:29, Jens Axboe wrote:
>> On 5/21/24 12:25 PM, Jens Axboe wrote:
>>> Outside of that, only other thing I can think of is that the final
>>> close would be punted to task_work by fput(), which means there's also
>>> a dependency on the task having run its kernel task_work before it's
>>> fully closed.
>>
>> Yep I think that's it, the below should fix it.
> 
> Tested-by: Andrew Udvare <audvare@gmail.com>

Added, thanks.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-05-21 19:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <313824bc-799d-414f-96b7-e6de57c7e21d@gmail.com>
2024-05-21 16:16 ` [REGRESSION] ETXTBSY when running Yarn (Node) since af5d68f Jens Axboe
2024-05-21 16:22   ` Jens Axboe
2024-05-21 18:17     ` Christian Heusel
2024-05-21 18:25       ` Jens Axboe
2024-05-21 18:29         ` Jens Axboe
2024-05-21 19:05           ` Jens Axboe
2024-05-21 19:07           ` Christian Heusel
2024-05-21 19:11             ` Jens Axboe
2024-05-21 19:30           ` Andrew Udvare
2024-05-21 19:32             ` Jens Axboe
2024-05-21 19:33           ` Andrew Udvare
2024-05-21 19:36             ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).