* drm scheduler and wq flavours
@ 2024-05-02 14:33 Tvrtko Ursulin
2024-05-06 23:23 ` Matthew Brost
0 siblings, 1 reply; 4+ messages in thread
From: Tvrtko Ursulin @ 2024-05-02 14:33 UTC (permalink / raw
To: Daniel Vetter, Rob Clark, Matthew Brost; +Cc: dri-devel@lists.freedesktop.org
Hi all,
Continuing after the brief IRC discussion yesterday regarding work
queues being prone to deadlocks or not, I had a browse around the code
base and ended up a bit confused.
When drm_sched_init documents and allocates an *ordered* wq, if no
custom one was provided, could someone remind me was the ordered
property fundamental for something to work correctly? Like run_job vs
free_job ordering?
I ask because it appears different drivers to different things and at
the moment it looks we have all possible combos or ordered/unordered,
bound and unbound, shared or not shared with the timeout wq, or even
unbound for the timeout wq.
The drivers worth looking at in this respect are probably nouveau,
panthor, pvr and xe.
Nouveau also talks about a depency betwen run_job and free_job and goes
to create two unordered wqs.
Then xe looks a bit funky with the workaround/hack for lockep where it
creates 512 work queues and hands them over to user queues in
round-robin fashion. (Instead of default 1:1.) Which I suspect is a
problem which should be applicable for any 1:1 driver given a thorough
enough test suite.
So anyway.. ordered vs unordered - drm sched dictated or at driver's choice?
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: drm scheduler and wq flavours
2024-05-02 14:33 drm scheduler and wq flavours Tvrtko Ursulin
@ 2024-05-06 23:23 ` Matthew Brost
2024-05-07 9:09 ` Tvrtko Ursulin
0 siblings, 1 reply; 4+ messages in thread
From: Matthew Brost @ 2024-05-06 23:23 UTC (permalink / raw
To: Tvrtko Ursulin; +Cc: Daniel Vetter, Rob Clark, dri-devel@lists.freedesktop.org
On Thu, May 02, 2024 at 03:33:50PM +0100, Tvrtko Ursulin wrote:
>
> Hi all,
>
> Continuing after the brief IRC discussion yesterday regarding work queues
> being prone to deadlocks or not, I had a browse around the code base and
> ended up a bit confused.
>
> When drm_sched_init documents and allocates an *ordered* wq, if no custom
> one was provided, could someone remind me was the ordered property
> fundamental for something to work correctly? Like run_job vs free_job
> ordering?
>
Before the work queue (kthread design), run_job & free_job were ordered.
It was decided to not break this existing behavior.
> I ask because it appears different drivers to different things and at the
> moment it looks we have all possible combos or ordered/unordered, bound and
> unbound, shared or not shared with the timeout wq, or even unbound for the
> timeout wq.
>
> The drivers worth looking at in this respect are probably nouveau, panthor,
> pvr and xe.
>
> Nouveau also talks about a depency betwen run_job and free_job and goes to
> create two unordered wqs.
>
> Then xe looks a bit funky with the workaround/hack for lockep where it
> creates 512 work queues and hands them over to user queues in round-robin
> fashion. (Instead of default 1:1.) Which I suspect is a problem which should
> be applicable for any 1:1 driver given a thorough enough test suite.
>
I think lockdep ran out of chains or something when executing some wild
IGT with 1:1. Yes, any driver with a wild enough test would likely hit
this lockdep splat too. Using a pool probably is not bad idea either.
> So anyway.. ordered vs unordered - drm sched dictated or at driver's choice?
>
Default ordered, driver can override with unordered.
Matt
> Regards,
>
> Tvrtko
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: drm scheduler and wq flavours
2024-05-06 23:23 ` Matthew Brost
@ 2024-05-07 9:09 ` Tvrtko Ursulin
2024-05-08 19:07 ` Matthew Brost
0 siblings, 1 reply; 4+ messages in thread
From: Tvrtko Ursulin @ 2024-05-07 9:09 UTC (permalink / raw
To: Matthew Brost; +Cc: Daniel Vetter, Rob Clark, dri-devel@lists.freedesktop.org
On 07/05/2024 00:23, Matthew Brost wrote:
> On Thu, May 02, 2024 at 03:33:50PM +0100, Tvrtko Ursulin wrote:
>>
>> Hi all,
>>
>> Continuing after the brief IRC discussion yesterday regarding work queues
>> being prone to deadlocks or not, I had a browse around the code base and
>> ended up a bit confused.
>>
>> When drm_sched_init documents and allocates an *ordered* wq, if no custom
>> one was provided, could someone remind me was the ordered property
>> fundamental for something to work correctly? Like run_job vs free_job
>> ordering?
>>
>
> Before the work queue (kthread design), run_job & free_job were ordered.
> It was decided to not break this existing behavior.
Simply for extra paranoia or you remember if there was a reason identified?
>> I ask because it appears different drivers to different things and at the
>> moment it looks we have all possible combos or ordered/unordered, bound and
>> unbound, shared or not shared with the timeout wq, or even unbound for the
>> timeout wq.
>>
>> The drivers worth looking at in this respect are probably nouveau, panthor,
>> pvr and xe.
>>
>> Nouveau also talks about a depency betwen run_job and free_job and goes to
>> create two unordered wqs.
>>
>> Then xe looks a bit funky with the workaround/hack for lockep where it
>> creates 512 work queues and hands them over to user queues in round-robin
>> fashion. (Instead of default 1:1.) Which I suspect is a problem which should
>> be applicable for any 1:1 driver given a thorough enough test suite.
>>
>
> I think lockdep ran out of chains or something when executing some wild
> IGT with 1:1. Yes, any driver with a wild enough test would likely hit
> this lockdep splat too. Using a pool probably is not bad idea either.
I wonder what is different between that and having a single shared
unbound queue and let kernel manage the concurrency? Both this..
>> So anyway.. ordered vs unordered - drm sched dictated or at driver's choice?
>>
>
> Default ordered, driver can override with unordered.
.. and this, go back to my original question - whether the default queue
must be ordered or not, or under which circustmances can drivers choose
unordered. I think in drm_sched_init, where kerneldoc says it will
create an ordered queue, it would be good to document the rules.
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: drm scheduler and wq flavours
2024-05-07 9:09 ` Tvrtko Ursulin
@ 2024-05-08 19:07 ` Matthew Brost
0 siblings, 0 replies; 4+ messages in thread
From: Matthew Brost @ 2024-05-08 19:07 UTC (permalink / raw
To: Tvrtko Ursulin; +Cc: Daniel Vetter, Rob Clark, dri-devel@lists.freedesktop.org
On Tue, May 07, 2024 at 10:09:18AM +0100, Tvrtko Ursulin wrote:
>
> On 07/05/2024 00:23, Matthew Brost wrote:
> > On Thu, May 02, 2024 at 03:33:50PM +0100, Tvrtko Ursulin wrote:
> > >
> > > Hi all,
> > >
> > > Continuing after the brief IRC discussion yesterday regarding work queues
> > > being prone to deadlocks or not, I had a browse around the code base and
> > > ended up a bit confused.
> > >
> > > When drm_sched_init documents and allocates an *ordered* wq, if no custom
> > > one was provided, could someone remind me was the ordered property
> > > fundamental for something to work correctly? Like run_job vs free_job
> > > ordering?
> > >
> >
> > Before the work queue (kthread design), run_job & free_job were ordered.
> > It was decided to not break this existing behavior.
>
> Simply for extra paranoia or you remember if there was a reason identified?
>
Not to break existing behavior. Can dig the entire thread if for
reference if needed.
> > > I ask because it appears different drivers to different things and at the
> > > moment it looks we have all possible combos or ordered/unordered, bound and
> > > unbound, shared or not shared with the timeout wq, or even unbound for the
> > > timeout wq.
> > >
> > > The drivers worth looking at in this respect are probably nouveau, panthor,
> > > pvr and xe.
> > >
> > > Nouveau also talks about a depency betwen run_job and free_job and goes to
> > > create two unordered wqs.
> > >
> > > Then xe looks a bit funky with the workaround/hack for lockep where it
> > > creates 512 work queues and hands them over to user queues in round-robin
> > > fashion. (Instead of default 1:1.) Which I suspect is a problem which should
> > > be applicable for any 1:1 driver given a thorough enough test suite.
> > >
> >
> > I think lockdep ran out of chains or something when executing some wild
> > IGT with 1:1. Yes, any driver with a wild enough test would likely hit
> > this lockdep splat too. Using a pool probably is not bad idea either.
>
> I wonder what is different between that and having a single shared unbound
> queue and let kernel manage the concurrency? Both this..
>
Each action (run_job, free_job, and Xe specific process msg) has its own
work item on the DRM scheduler work queue. In Xe, these options must be
ordered, or strictly speaking, not executed in parallel within the DRM
sched entity/scheduler. With a single shared unbound queue, this breaks.
> > > So anyway.. ordered vs unordered - drm sched dictated or at driver's choice?
> > >
> >
> > Default ordered, driver can override with unordered.
>
> .. and this, go back to my original question - whether the default queue
> must be ordered or not, or under which circustmances can drivers choose
> unordered. I think in drm_sched_init, where kerneldoc says it will create an
> ordered queue, it would be good to document the rules.
>
Sure. Let me write something up.
Matt
> Regards,
>
> Tvrtko
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-05-08 19:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-02 14:33 drm scheduler and wq flavours Tvrtko Ursulin
2024-05-06 23:23 ` Matthew Brost
2024-05-07 9:09 ` Tvrtko Ursulin
2024-05-08 19:07 ` Matthew Brost
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.