xenomai.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@siemens.com>
To: Florian Bezdeka <florian.bezdeka@siemens.com>, xenomai@lists.linux.dev
Cc: Clara Kowalsky <clara.kowalsky@siemens.com>
Subject: Re: Problem with rt_task_set_affinity() in combination with xenomai.supported_cpus
Date: Wed, 10 Apr 2024 15:57:52 +0200	[thread overview]
Message-ID: <c659e50a-f3bc-474b-b8b1-808a2c99a0f2@siemens.com> (raw)
In-Reply-To: <0c9e4bbe6a68452f2ab6c5526609de514ebb2b3a.camel@siemens.com>

On 10.04.24 13:59, Florian Bezdeka wrote:
> On Wed, 2024-04-10 at 13:37 +0200, Florian Bezdeka wrote:
>> On Wed, 2024-04-10 at 12:49 +0200, Jan Kiszka wrote:
>>> On 10.04.24 12:09, Florian Bezdeka wrote:
>>>> On Tue, 2024-04-09 at 19:13 +0200, Jan Kiszka wrote:
>>>>> On 09.04.24 13:56, Florian Bezdeka wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> while looking into recent bug reports we were facing a situation where
>>>>>> we were seeing completely different behavior of provided test
>>>>>> applications / reproducers than the original reporter.
>>>>>>
>>>>>> It turned out that reproducers were assuming that specific CPUs are
>>>>>> allowed to run Xenomai applications, which was not the case. The
>>>>>> cmdline parameter xenomai.supported_cpus=0x3 (= allow the first 2 CPUs
>>>>>> to serve RT tasks) was used here to shrink the set of CPUs that are
>>>>>> allowed to run RT tasks.
>>>>>>
>>>>>> Example (out of my head), error handling ignored:
>>>>>>
>>>>>> void rt_main(void *arg)
>>>>>> {
>>>>>> 	// do RT stuff
>>>>>> }
>>>>>>
>>>>>> main()
>>>>>> {
>>>>>> 	RT_TASK demo_task;
>>>>>> 	cpu_set_t mask;
>>>>>> 	int ret;
>>>>>>
>>>>>> 	CPU_ZERO(&mask);
>>>>>> 	CPU_SET(3, &mask); // Assumption: CPU 3 is allowed to run
>>>>>> 	                   // RT tasks
>>>>>>
>>>>>> 	rt_task_create(&demo_task, "demo", 0, 50, 0);
>>>>>>
>>>>>> 	rt_task_set_affinity(&demo_task, &mask);
>>>>>>
>>>>>> 	rt_task_start(&demo_task, &rt_main, 0);
>>>>>> }
>>>>>>
>>>>>> rt_task_set_affinity() does not report any error in case the supplied
>>>>>> CPU mask is "invalid" in terms of "there is no usable CPU available".
>>>>>> If an invalid affinity was set rt_task_start() will block forever.
>>>>>
>>>>> But the task was already created, is only waiting for the kick-off by
>>>>> rt_task_start. What happens to it?
>>>>
>>>> The thread is canceled inside the wakeup path to the primary domain: 
>>>>
>>>> [510675.581142] [Xenomai] thread demo[1570] switched to non-rt CPU3, aborted.
>>>>
>>>> The caller of rt_task_create() will block forever:
>>>>
>>>> #0  0xf7fc7579 in __kernel_vsyscall ()
>>>> #1  0xf7e544e7 in syscall () from /lib/i386-linux-gnu/libc.so.6
>>>> #2  0xf7f74791 in do_sc_cond_wait_prologue (cnd=0xf746b82c, mx=0xf746b758, 
>>>>     err=0xffffd9a4, timed=0, abstime=0x0) at cond.c:243
>>>> #3  0xf7f74915 in __cobalt_pthread_cond_wait (cond=0xf746b82c, 
>>>>     mutex=0xf746b758) at cond.c:329
>>>> #4  0xf7f9db5a in threadobj_cond_wait (cond=0xf746b82c, lock=0xf746b758)
>>>>     at ../../include/copperplate/threadobj.h:572
>>>> #5  0xf7f9e9a8 in wait_on_barrier (thobj=0xf746b750, mask=16)
>>>>     at threadobj.c:1237
>>>> #6  0xf7f9ea92 in threadobj_start (thobj=0xf746b750) at threadobj.c:1274
>>>> #7  0xf7fb2464 in rt_task_start (task=0xffffdb60, entry=0x5655647a <rt_main>, 
>>>>     arg=0x0) at task.c:644
>>>> #8  0x565565b1 in main ()
>>>>
>>>>>
>>>>>>
>>>>>> How should the application know that the affinity is invalid?
>>>>>>
>>>>>> My current understanding is that the there is no hook (provided by
>>>>>> dovetail) that would allow intersecting the sched_setaffinity() call
>>>>>> that is used at the end.
>>>>>>
>>>>>> Should we implement such a hook? Is there something that I missed?
>>>>>>
>>>>>
>>>>> We could check the mask from userspace in libalchemy. That would remain
>>>>> racy, though, as the set of supported CPUs could still change after
>>>>> set_affinity and before start.
>>>>>
>>>>> Do we only have the problem in libalchemy due to its split create/start
>>>>> pattern?
>>>>
>>>> AFAICT this is a general issue, not limited to the alchemy skin. We
>>>> never intercept sched_setaffinity(), so it's basically a pure Linux
>>>> API.
>>>>
>>>> I would say - as it is a pure Linux API - that we can not validate
>>>> affinities regarding Xenomai supported CPUs. We could implement
>>>> something for the alchemy skin, but the POSIX skin would need an
>>>> additional wrapper.
>>>>
>>>> We have /proc/xenomai/affinity, that might be useful. Not sure if we
>>>> can read that without triggering a Linux migration. IMHO, validation on
>>>> kernel side would be better. At least it would avoid one more wrapper.
>>>
>>> Not an issue: sched_setaffinity & Co. are not intercepted, so those are
>>> causing a migration anyway.
>>
>> The remaining open topic would be: How to do that for the POSIX theme,
>> we don't have any kind of interception available for
>> sched_setaffinity() right now.
>>
>> I see two possibilities: 
>> a) additional an additional wrapper
>> b) validation (by implementing some kind of additional hook into the
>> cobalt core) on kernel side
>>
>> Validation in userspace is more racy, but does not require any dovetail
>> extension...
> 
> a) should have been "add an additional wrapper". Sorry.
> 

I'm not yet saying that this is the best option - it's one. We could
also think about adding an in-kernel hook but that won't be simple either.

Jan

-- 
Siemens AG, Technology
Linux Expert Center


      reply	other threads:[~2024-04-10 13:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-09 11:56 Problem with rt_task_set_affinity() in combination with xenomai.supported_cpus Florian Bezdeka
2024-04-09 17:13 ` Jan Kiszka
2024-04-10 10:09   ` Florian Bezdeka
2024-04-10 10:49     ` Jan Kiszka
2024-04-10 11:37       ` Florian Bezdeka
2024-04-10 11:59         ` Florian Bezdeka
2024-04-10 13:57           ` Jan Kiszka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c659e50a-f3bc-474b-b8b1-808a2c99a0f2@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=clara.kowalsky@siemens.com \
    --cc=florian.bezdeka@siemens.com \
    --cc=xenomai@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).