From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: ** X-Spam-Status: No, score=2.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2480DC47097 for ; Thu, 3 Jun 2021 08:21:02 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CF4C2613C9 for ; Thu, 3 Jun 2021 08:21:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CF4C2613C9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 43F0A6E0F0; Thu, 3 Jun 2021 08:20:57 +0000 (UTC) Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by gabe.freedesktop.org (Postfix) with ESMTPS id 32CC16E0EA; Thu, 3 Jun 2021 08:20:55 +0000 (UTC) Received: by mail-pl1-x62a.google.com with SMTP id c13so2519949plz.0; Thu, 03 Jun 2021 01:20:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=R98p/vMQB4GdGxSkc2FOFtr0afLJ+AZfN9xNoibniUs=; b=ZGXl9cjRZRLHIf9rp+1wY9aJA3e3tyXGiI2qcJm57xbuwznOCETxhdjxbhV6E3dXSP QQMYVPRr0Pc82F+4oTsgjrTVQ2qm0fPCFxXsh0+079YGFM2sT49OSLpbmkTffs4YVEZb JcribCoj8LI+bz1ZBNWKwRheAtvxFgxyMRM9IMq0Fb/JTmPFfPPFiYvuyMy+T+QLREEr ZgxF0JnuKFjYSjpP2XeboAYDOC2/pj3fht/3X+MBKAMZwyDjB95nsZISDGVivWayQLjB bdbYaQYJqpFDN2hvakRDRYGid2I310ltjX+92KioxToJJyPqY8ahEchybPXBeNjD2XgO Cp/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=R98p/vMQB4GdGxSkc2FOFtr0afLJ+AZfN9xNoibniUs=; b=qOU6MKT7dow0L5ml/a9EhnqJuYjStglwMjjxwxCRUHwviRwlKY6s3QwVebv9RdMd6H uo0vw5yZNLfGwyIOaWFSC+BJnhaFcpVOaHRqAsBmtAmP5DU8rkjMUuBLB6JLAgADdyI0 t99qdvlh4gUlkL/Sa5Cm8xtpAS/IBmBJDIiEgkefx4mOxi5Q+gZJQh8EvTOSy1NuiJlV oQIccsNSeyzgf3K1GT2UQoCQg/gqggIkdF3DFlOkbC7ohIZgddmlzHGNf4EFFLwa5Wlt pfbnbd9C8wF0Z+OPPEX1uLXRaHkd4du1ePO2QsgRlCGYqUIAnUogqi+ZtF0RKkCwD2Iw 30kQ== X-Gm-Message-State: AOAM532OObxlflE/9wPr4dKQqYPEpKr0AEY1cBq8ERG6emn6VJICqHNE zdmlxXWsZ8oUYBuVBQZUxytwKj+oiKolQV6Zg3s= X-Google-Smtp-Source: ABdhPJxRkTvFA92DdJv1YAMjjlCc0wa0gBYRflL9y1KXuGkpwM7boT7oTObHmay01u7dzEaLxF6RAuR9NSTrAILLfhA= X-Received: by 2002:a17:902:db0f:b029:f3:e5f4:87f1 with SMTP id m15-20020a170902db0fb02900f3e5f487f1mr34039388plx.26.1622708454897; Thu, 03 Jun 2021 01:20:54 -0700 (PDT) MIME-Version: 1.0 References: <327e4008-b29f-f5b7-bb30-532fa52c797f@gmail.com> <7f19e3c7-b6b2-5200-95eb-3fed8d22a6b3@daenzer.net> In-Reply-To: From: =?UTF-8?B?TWFyZWsgT2zFocOhaw==?= Date: Thu, 3 Jun 2021 04:20:18 -0400 Message-ID: Subject: Re: [Mesa-dev] Linux Graphics Next: Userspace submission update To: Daniel Vetter Content-Type: multipart/alternative; boundary="00000000000007430205c3d841b7" X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?Q?Christian_K=C3=B6nig?= , =?UTF-8?Q?Michel_D=C3=A4nzer?= , dri-devel , Jason Ekstrand , ML Mesa-dev Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" --00000000000007430205c3d841b7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Jun 3, 2021 at 3:47 AM Daniel Vetter wrote: > On Wed, Jun 02, 2021 at 11:16:39PM -0400, Marek Ol=C5=A1=C3=A1k wrote: > > On Wed, Jun 2, 2021 at 2:48 PM Daniel Vetter wrote: > > > > > On Wed, Jun 02, 2021 at 05:38:51AM -0400, Marek Ol=C5=A1=C3=A1k wrote= : > > > > On Wed, Jun 2, 2021 at 5:34 AM Marek Ol=C5=A1=C3=A1k wrote: > > > > > > > > > Yes, we can't break anything because we don't want to complicate > things > > > > > for us. It's pretty much all NAK'd already. We are trying to gath= er > > > more > > > > > knowledge and then make better decisions. > > > > > > > > > > The idea we are considering is that we'll expose memory-based syn= c > > > objects > > > > > to userspace for read only, and the kernel or hw will strictly > control > > > the > > > > > memory writes to those sync objects. The hole in that idea is tha= t > > > > > userspace can decide not to signal a job, so even if userspace > can't > > > > > overwrite memory-based sync object states arbitrarily, it can sti= ll > > > decide > > > > > not to signal them, and then a future fence is born. > > > > > > > > > > > > > This would actually be treated as a GPU hang caused by that context= , > so > > > it > > > > should be fine. > > > > > > This is practically what I proposed already, except your not doing it > with > > > dma_fence. And on the memory fence side this also doesn't actually gi= ve > > > what you want for that compute model. > > > > > > This seems like a bit a worst of both worlds approach to me? Tons of > work > > > in the kernel to hide these not-dma_fence-but-almost, and still pain = to > > > actually drive the hardware like it should be for compute or direct > > > display. > > > > > > Also maybe I've missed it, but I didn't see any replies to my > suggestion > > > how to fake the entire dma_fence stuff on top of new hw. Would be > > > interesting to know what doesn't work there instead of amd folks goin= g > of > > > into internal again and then coming back with another rfc from out of > > > nowhere :-) > > > > > > > Going internal again is probably a good idea to spare you the long > > discussions and not waste your time, but we haven't talked about the > > dma_fence stuff internally other than acknowledging that it can be > solved. > > > > The compute use case already uses the hw as-is with no inter-process > > sharing, which mostly keeps the kernel out of the picture. It uses > glFinish > > to sync with GL. > > > > The gfx use case needs new hardware logic to support implicit and > explicit > > sync. When we propose a solution, it's usually torn apart the next day = by > > ourselves. > > > > Since we are talking about next hw or next next hw, preemption should b= e > > better. > > > > user queue =3D user-mapped ring buffer > > > > For implicit sync, we will only let userspace lock access to a buffer > via a > > user queue, which waits for the per-buffer sequence counter in memory t= o > be > > >=3D the number assigned by the kernel, and later unlock the access wit= h > > another command, which increments the per-buffer sequence counter in > memory > > with atomic_inc regardless of the number assigned by the kernel. The > kernel > > counter and the counter in memory can be out-of-sync, and I'll explain > why > > it's OK. If a process increments the kernel counter but not the memory > > counter, that's its problem and it's the same as a GPU hang caused by > that > > process. If a process increments the memory counter but not the kernel > > counter, the ">=3D" condition alongside atomic_inc guarantee that > signaling n > > will signal n+1, so it will never deadlock but also it will effectively > > disable synchronization. This method of disabling synchronization is > > similar to a process corrupting the buffer, which should be fine. Can y= ou > > find any flaw in it? I can't find any. > > Hm maybe I misunderstood what exactly you wanted to do earlier. That kind > of "we let userspace free-wheel whatever it wants, kernel ensures > correctness of the resulting chain of dma_fence with reset the entire > context" is what I proposed too. > > Like you say, userspace is allowed to render garbage already. > > > The explicit submit can be done by userspace (if there is no > > synchronization), but we plan to use the kernel to do it for implicit > sync. > > Essentially, the kernel will receive a buffer list and addresses of wai= t > > commands in the user queue. It will assign new sequence numbers to all > > buffers and write those numbers into the wait commands, and ring the hw > > doorbell to start execution of that queue. > > Yeah for implicit sync I think kernel and using drm/scheduler to sort out > the dma_fence dependencies is probably best. Since you can filter out > which dma_fence you hand to the scheduler for dependency tracking you can > filter out your own ones and let the hw handle those directly (depending > how much your hw can do an all that). On i915 we might do that to be able > to use MI_SEMAPHORE_WAIT/SIGNAL functionality in the hw and fw scheduler. > > For buffer tracking with implicit sync I think cleanest is probably to > still keep them wrapped as dma_fence and stuffed into dma_resv, but > conceptually it's the same. If we let every driver reinvent their own > buffer tracking just because the hw works a bit different it'll be a mess= . > > Wrt wait commands: I'm honestly not sure why you'd do that. Userspace get= s > to keep the pieces if it gets it wrong. You do still need to handle > external dma_fence though, hence drm/scheduler frontend to sort these out= . > The reason is to disallow lower-privileged process to deadlock/hang a higher-privileged process where the kernel can't tell who did it. If the implicit-sync sequence counter is read only to userspace and only incrementable by the unlock-signal command after the lock-wait command appeared in the same queue (both together forming a critical section), userspace can't manipulate it arbitrarily and we get almost the exact same behavior as implicit sync has today. That means any implicitly-sync'd buffer from any process can be fully trusted by a compositor to signal in a finite time, and possibly even trusted by the kernel. The only thing that's different is that a malicious process can disable implicit sync for a buffer in all processes/kernel, but it can't hang other processes/kernel (it can only hang itself and the kernel will be notified). So I'm a happy panda now. :) Marek --00000000000007430205c3d841b7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Thu, Jun 3, 2021 at 3:47 AM Daniel Vetter <daniel@ffwll.ch> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft:1px solid rgb(204,204,204);padding-left:1ex">On Wed, Jun 02, 2021 at 11= :16:39PM -0400, Marek Ol=C5=A1=C3=A1k wrote:
> On Wed, Jun 2, 2021 at 2:48 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > On Wed, Jun 02, 2021 at 05:38:51AM -0400, Marek Ol=C5=A1=C3=A1k w= rote:
> > > On Wed, Jun 2, 2021 at 5:34 AM Marek Ol=C5=A1=C3=A1k <maraeo@gmail.com>= wrote:
> > >
> > > > Yes, we can't break anything because we don't w= ant to complicate things
> > > > for us. It's pretty much all NAK'd already. We = are trying to gather
> > more
> > > > knowledge and then make better decisions.
> > > >
> > > > The idea we are considering is that we'll expose me= mory-based sync
> > objects
> > > > to userspace for read only, and the kernel or hw will s= trictly control
> > the
> > > > memory writes to those sync objects. The hole in that i= dea is that
> > > > userspace can decide not to signal a job, so even if us= erspace can't
> > > > overwrite memory-based sync object states arbitrarily, = it can still
> > decide
> > > > not to signal them, and then a future fence is born. > > > >
> > >
> > > This would actually be treated as a GPU hang caused by that = context, so
> > it
> > > should be fine.
> >
> > This is practically what I proposed already, except your not doin= g it with
> > dma_fence. And on the memory fence side this also doesn't act= ually give
> > what you want for that compute model.
> >
> > This seems like a bit a worst of both worlds approach to me? Tons= of work
> > in the kernel to hide these not-dma_fence-but-almost, and still p= ain to
> > actually drive the hardware like it should be for compute or dire= ct
> > display.
> >
> > Also maybe I've missed it, but I didn't see any replies t= o my suggestion
> > how to fake the entire dma_fence stuff on top of new hw. Would be=
> > interesting to know what doesn't work there instead of amd fo= lks going of
> > into internal again and then coming back with another rfc from ou= t of
> > nowhere :-)
> >
>
> Going internal again is probably a good idea to spare you the long
> discussions and not waste your time, but we haven't talked about t= he
> dma_fence stuff internally other than acknowledging that it can be sol= ved.
>
> The compute use case already uses the hw as-is with no inter-process > sharing, which mostly keeps the kernel out of the picture. It uses glF= inish
> to sync with GL.
>
> The gfx use case needs new hardware logic to support implicit and expl= icit
> sync. When we propose a solution, it's usually torn apart the next= day by
> ourselves.
>
> Since we are talking about next hw or next next hw, preemption should = be
> better.
>
> user queue =3D user-mapped ring buffer
>
> For implicit sync, we will only let userspace lock access to a buffer = via a
> user queue, which waits for the per-buffer sequence counter in memory = to be
> >=3D the number assigned by the kernel, and later unlock the access= with
> another command, which increments the per-buffer sequence counter in m= emory
> with atomic_inc regardless of the number assigned by the kernel. The k= ernel
> counter and the counter in memory can be out-of-sync, and I'll exp= lain why
> it's OK. If a process increments the kernel counter but not the me= mory
> counter, that's its problem and it's the same as a GPU hang ca= used by that
> process. If a process increments the memory counter but not the kernel=
> counter, the ">=3D" condition alongside atomic_inc guaran= tee that signaling n
> will signal n+1, so it will never deadlock but also it will effectivel= y
> disable synchronization. This method of disabling synchronization is > similar to a process corrupting the buffer, which should be fine. Can = you
> find any flaw in it? I can't find any.

Hm maybe I misunderstood what exactly you wanted to do earlier. That kind of "we let userspace free-wheel whatever it wants, kernel ensures
correctness of the resulting chain of dma_fence with reset the entire
context" is what I proposed too.

Like you say, userspace is allowed to render garbage already.

> The explicit submit can be done by userspace (if there is no
> synchronization), but we plan to use the kernel to do it for implicit = sync.
> Essentially, the kernel will receive a buffer list and addresses of wa= it
> commands in the user queue. It will assign new sequence numbers to all=
> buffers and write those numbers into the wait commands, and ring the h= w
> doorbell to start execution of that queue.

Yeah for implicit sync I think kernel and using drm/scheduler to sort out the dma_fence dependencies is probably best. Since you can filter out
which dma_fence you hand to the scheduler for dependency tracking you can filter out your own ones and let the hw handle those directly (depending how much your hw can do an all that). On i915 we might do that to be able to use MI_SEMAPHORE_WAIT/SIGNAL functionality in the hw and fw scheduler.
For buffer tracking with implicit sync I think cleanest is probably to
still keep them wrapped as dma_fence and stuffed into dma_resv, but
conceptually it's the same. If we let every driver reinvent their own buffer tracking just because the hw works a bit different it'll be a me= ss.

Wrt wait commands: I'm honestly not sure why you'd do that. Userspa= ce gets
to keep the pieces if it gets it wrong. You do still need to handle
external dma_fence though, hence drm/scheduler frontend to sort these out.<= br>

The reason is to disallow lower-privileged p= rocess to deadlock/hang a higher-privileged process where the kernel can= 9;t tell who did it. If the implicit-sync sequence counter is read only to = userspace and only incrementable by the unlock-signal command after the loc= k-wait command appeared in the same queue (both together forming a critical= section), userspace can't manipulate it arbitrarily and we get almost = the exact same behavior as implicit sync has today. That means any implicit= ly-sync'd buffer from any process can be fully trusted by a compositor = to signal in a finite time, and possibly even trusted by the kernel. The on= ly thing that's different is that a malicious process can disable impli= cit sync for a buffer in all processes/kernel, but it can't hang other = processes/kernel (it can only hang itself and the kernel will be notified).= So I'm a happy panda now. :)

Marek
--00000000000007430205c3d841b7--