All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* drm scheduler and wq flavours
@ 2024-05-02 14:33 Tvrtko Ursulin
  2024-05-06 23:23 ` Matthew Brost
  0 siblings, 1 reply; 4+ messages in thread
From: Tvrtko Ursulin @ 2024-05-02 14:33 UTC (permalink / raw
  To: Daniel Vetter, Rob Clark, Matthew Brost; +Cc: dri-devel@lists.freedesktop.org


Hi all,

Continuing after the brief IRC discussion yesterday regarding work 
queues being prone to deadlocks or not, I had a browse around the code 
base and ended up a bit confused.

When drm_sched_init documents and allocates an *ordered* wq, if no 
custom one was provided, could someone remind me was the ordered 
property fundamental for something to work correctly? Like run_job vs 
free_job ordering?

I ask because it appears different drivers to different things and at 
the moment it looks we have all possible combos or ordered/unordered, 
bound and unbound, shared or not shared with the timeout wq, or even 
unbound for the timeout wq.

The drivers worth looking at in this respect are probably nouveau, 
panthor, pvr and xe.

Nouveau also talks about a depency betwen run_job and free_job and goes 
to create two unordered wqs.

Then xe looks a bit funky with the workaround/hack for lockep where it 
creates 512 work queues and hands them over to user queues in 
round-robin fashion. (Instead of default 1:1.) Which I suspect is a 
problem which should be applicable for any 1:1 driver given a thorough 
enough test suite.

So anyway.. ordered vs unordered - drm sched dictated or at driver's choice?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: drm scheduler and wq flavours
  2024-05-02 14:33 drm scheduler and wq flavours Tvrtko Ursulin
@ 2024-05-06 23:23 ` Matthew Brost
  2024-05-07  9:09   ` Tvrtko Ursulin
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Brost @ 2024-05-06 23:23 UTC (permalink / raw
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Rob Clark, dri-devel@lists.freedesktop.org

On Thu, May 02, 2024 at 03:33:50PM +0100, Tvrtko Ursulin wrote:
> 
> Hi all,
> 
> Continuing after the brief IRC discussion yesterday regarding work queues
> being prone to deadlocks or not, I had a browse around the code base and
> ended up a bit confused.
> 
> When drm_sched_init documents and allocates an *ordered* wq, if no custom
> one was provided, could someone remind me was the ordered property
> fundamental for something to work correctly? Like run_job vs free_job
> ordering?
> 

Before the work queue (kthread design), run_job & free_job were ordered.
It was decided to not break this existing behavior.

> I ask because it appears different drivers to different things and at the
> moment it looks we have all possible combos or ordered/unordered, bound and
> unbound, shared or not shared with the timeout wq, or even unbound for the
> timeout wq.
> 
> The drivers worth looking at in this respect are probably nouveau, panthor,
> pvr and xe.
> 
> Nouveau also talks about a depency betwen run_job and free_job and goes to
> create two unordered wqs.
> 
> Then xe looks a bit funky with the workaround/hack for lockep where it
> creates 512 work queues and hands them over to user queues in round-robin
> fashion. (Instead of default 1:1.) Which I suspect is a problem which should
> be applicable for any 1:1 driver given a thorough enough test suite.
> 

I think lockdep ran out of chains or something when executing some wild
IGT with 1:1. Yes, any driver with a wild enough test would likely hit
this lockdep splat too. Using a pool probably is not bad idea either.

> So anyway.. ordered vs unordered - drm sched dictated or at driver's choice?
>

Default ordered, driver can override with unordered.

Matt
 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: drm scheduler and wq flavours
  2024-05-06 23:23 ` Matthew Brost
@ 2024-05-07  9:09   ` Tvrtko Ursulin
  2024-05-08 19:07     ` Matthew Brost
  0 siblings, 1 reply; 4+ messages in thread
From: Tvrtko Ursulin @ 2024-05-07  9:09 UTC (permalink / raw
  To: Matthew Brost; +Cc: Daniel Vetter, Rob Clark, dri-devel@lists.freedesktop.org


On 07/05/2024 00:23, Matthew Brost wrote:
> On Thu, May 02, 2024 at 03:33:50PM +0100, Tvrtko Ursulin wrote:
>>
>> Hi all,
>>
>> Continuing after the brief IRC discussion yesterday regarding work queues
>> being prone to deadlocks or not, I had a browse around the code base and
>> ended up a bit confused.
>>
>> When drm_sched_init documents and allocates an *ordered* wq, if no custom
>> one was provided, could someone remind me was the ordered property
>> fundamental for something to work correctly? Like run_job vs free_job
>> ordering?
>>
> 
> Before the work queue (kthread design), run_job & free_job were ordered.
> It was decided to not break this existing behavior.

Simply for extra paranoia or you remember if there was a reason identified?

>> I ask because it appears different drivers to different things and at the
>> moment it looks we have all possible combos or ordered/unordered, bound and
>> unbound, shared or not shared with the timeout wq, or even unbound for the
>> timeout wq.
>>
>> The drivers worth looking at in this respect are probably nouveau, panthor,
>> pvr and xe.
>>
>> Nouveau also talks about a depency betwen run_job and free_job and goes to
>> create two unordered wqs.
>>
>> Then xe looks a bit funky with the workaround/hack for lockep where it
>> creates 512 work queues and hands them over to user queues in round-robin
>> fashion. (Instead of default 1:1.) Which I suspect is a problem which should
>> be applicable for any 1:1 driver given a thorough enough test suite.
>>
> 
> I think lockdep ran out of chains or something when executing some wild
> IGT with 1:1. Yes, any driver with a wild enough test would likely hit
> this lockdep splat too. Using a pool probably is not bad idea either.

I wonder what is different between that and having a single shared 
unbound queue and let kernel manage the concurrency? Both this..

>> So anyway.. ordered vs unordered - drm sched dictated or at driver's choice?
>>
> 
> Default ordered, driver can override with unordered.

.. and this, go back to my original question - whether the default queue 
must be ordered or not, or under which circustmances can drivers choose 
unordered. I think in drm_sched_init, where kerneldoc says it will 
create an ordered queue, it would be good to document the rules.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: drm scheduler and wq flavours
  2024-05-07  9:09   ` Tvrtko Ursulin
@ 2024-05-08 19:07     ` Matthew Brost
  0 siblings, 0 replies; 4+ messages in thread
From: Matthew Brost @ 2024-05-08 19:07 UTC (permalink / raw
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Rob Clark, dri-devel@lists.freedesktop.org

On Tue, May 07, 2024 at 10:09:18AM +0100, Tvrtko Ursulin wrote:
> 
> On 07/05/2024 00:23, Matthew Brost wrote:
> > On Thu, May 02, 2024 at 03:33:50PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > Hi all,
> > > 
> > > Continuing after the brief IRC discussion yesterday regarding work queues
> > > being prone to deadlocks or not, I had a browse around the code base and
> > > ended up a bit confused.
> > > 
> > > When drm_sched_init documents and allocates an *ordered* wq, if no custom
> > > one was provided, could someone remind me was the ordered property
> > > fundamental for something to work correctly? Like run_job vs free_job
> > > ordering?
> > > 
> > 
> > Before the work queue (kthread design), run_job & free_job were ordered.
> > It was decided to not break this existing behavior.
> 
> Simply for extra paranoia or you remember if there was a reason identified?
> 

Not to break existing behavior. Can dig the entire thread if for
reference if needed.

> > > I ask because it appears different drivers to different things and at the
> > > moment it looks we have all possible combos or ordered/unordered, bound and
> > > unbound, shared or not shared with the timeout wq, or even unbound for the
> > > timeout wq.
> > > 
> > > The drivers worth looking at in this respect are probably nouveau, panthor,
> > > pvr and xe.
> > > 
> > > Nouveau also talks about a depency betwen run_job and free_job and goes to
> > > create two unordered wqs.
> > > 
> > > Then xe looks a bit funky with the workaround/hack for lockep where it
> > > creates 512 work queues and hands them over to user queues in round-robin
> > > fashion. (Instead of default 1:1.) Which I suspect is a problem which should
> > > be applicable for any 1:1 driver given a thorough enough test suite.
> > > 
> > 
> > I think lockdep ran out of chains or something when executing some wild
> > IGT with 1:1. Yes, any driver with a wild enough test would likely hit
> > this lockdep splat too. Using a pool probably is not bad idea either.
> 
> I wonder what is different between that and having a single shared unbound
> queue and let kernel manage the concurrency? Both this..
> 

Each action (run_job, free_job, and Xe specific process msg) has its own
work item on the DRM scheduler work queue. In Xe, these options must be
ordered, or strictly speaking, not executed in parallel within the DRM
sched entity/scheduler. With a single shared unbound queue, this breaks.

> > > So anyway.. ordered vs unordered - drm sched dictated or at driver's choice?
> > > 
> > 
> > Default ordered, driver can override with unordered.
> 
> .. and this, go back to my original question - whether the default queue
> must be ordered or not, or under which circustmances can drivers choose
> unordered. I think in drm_sched_init, where kerneldoc says it will create an
> ordered queue, it would be good to document the rules.
>

Sure. Let me write something up.

Matt

> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-08 19:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-02 14:33 drm scheduler and wq flavours Tvrtko Ursulin
2024-05-06 23:23 ` Matthew Brost
2024-05-07  9:09   ` Tvrtko Ursulin
2024-05-08 19:07     ` Matthew Brost

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.