From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: ** X-Spam-Status: No, score=2.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAAF5C2B9F4 for ; Thu, 17 Jun 2021 18:28:49 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9F26F613AA for ; Thu, 17 Jun 2021 18:28:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F26F613AA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5DC206E97F; Thu, 17 Jun 2021 18:28:44 +0000 (UTC) Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by gabe.freedesktop.org (Postfix) with ESMTPS id 756876E120; Thu, 17 Jun 2021 18:28:43 +0000 (UTC) Received: by mail-pf1-x431.google.com with SMTP id z26so5699269pfj.5; Thu, 17 Jun 2021 11:28:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Xp/s8eBBtXoZ5gW9nyzWQSCJ06d7ccV+8Q2YVLMAHRQ=; b=AD/F/N79gQ3PrZfldcmCftJMeDhXgX5WSWoh2cxiy8IME5okXLTERws/8DNxDj897U VthLLcxqOaQrCjlVCfIlArjOInmN5N9D3B8kc/b0LTy8VKHDlw4FVmjZP08xaPgM9rVe V6TnIi3lFqJLNlTiIxacZM2AnGm3H3T2mjkQvJIVX5KD8g09mbgRzQ72yXjzKfLfaYvq 9dNCTFN4/7I6mrBxzLGVnc7iK+A44zOFl2xbC72HXokyW8v01tHe0zWDwtF493XqgUPi WVN/pKmnKg+ormJRx8RePBe+tR/S/DWPS8uNpjRleIkCHDqE15+Y40/0OlAUv8TTdJ9+ TH5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Xp/s8eBBtXoZ5gW9nyzWQSCJ06d7ccV+8Q2YVLMAHRQ=; b=PFip/isJimKYetfjpM6w6xzUqC6CjXTL7cgH5IlPc+Ed/dxxmIwEQFOKYxy3UyKvjX pzlAydVVFNWIdFUsJWWms0z8u367ZOHjvwR6Ok3a+IQY7j23sEVqsLZELy4pJpX6w90+ ibgOK1xtdxPF4gp9ov1jqAGZzUZFfWPjmaMIR0l4mIruQkesjbUTB3qEbNgRx8cFgq5e IGScEugMdFLawAwf45i0g6cXWmTiJvAaALK11JmRudhqW11pM7Bu48Tq3dWJLVys7+6S 0LwOrqnJXjxfSqXDt67OVQTuhHaXifiBxeZX2pf0ng2aRRacTVhU9AREaWxutaKhS8LF UczQ== X-Gm-Message-State: AOAM531qnClXB5ProyH1RlR/ghkmWaIPwZajl/qtV+WMv+vt6uwKKGKq FxR8kiJ+IIRcTOzK99CxQtXxxM334FF6vXlbXFU= X-Google-Smtp-Source: ABdhPJyJlbGexaFM/me1myIQBqFdyuQjqKMrpZmfK4uu8rpopmcp5xeh03i7K2wbPsEByr1EKMfXutfBLX5Q11kqRIo= X-Received: by 2002:a63:f817:: with SMTP id n23mr6196972pgh.208.1623954523086; Thu, 17 Jun 2021 11:28:43 -0700 (PDT) MIME-Version: 1.0 References: <0fbb1197-fa88-c474-09db-6daec13d3004@gmail.com> <586edeb3-73df-3da2-4925-1829712cba8b@gmail.com> <1478737b-88aa-a24a-d2d7-cd3716df0cb0@gmail.com> In-Reply-To: From: =?UTF-8?B?TWFyZWsgT2zFocOhaw==?= Date: Thu, 17 Jun 2021 14:28:06 -0400 Message-ID: Subject: Re: [Mesa-dev] Linux Graphics Next: Userspace submission update To: Daniel Vetter Content-Type: multipart/alternative; boundary="0000000000007ae2c505c4fa604d" X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?Q?Christian_K=C3=B6nig?= , =?UTF-8?Q?Michel_D=C3=A4nzer?= , dri-devel , Jason Ekstrand , ML Mesa-dev Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" --0000000000007ae2c505c4fa604d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The kernel will know who should touch the implicit-sync semaphore next, and at the same time, the copy of all write requests to the implicit-sync semaphore will be forwarded to the kernel for monitoring and bo_wait. Syncobjs could either use the same monitored access as implicit sync or be completely unmonitored. We haven't decided yet. Syncfiles could either use one of the above or wait for a syncobj to go idle before converting to a syncfile. Marek On Thu, Jun 17, 2021 at 12:48 PM Daniel Vetter wrote: > On Mon, Jun 14, 2021 at 07:13:00PM +0200, Christian K=C3=B6nig wrote: > > As long as we can figure out who touched to a certain sync object last > that > > would indeed work, yes. > > Don't you need to know who will touch it next, i.e. who is holding up you= r > fence? Or maybe I'm just again totally confused. > -Daniel > > > > > Christian. > > > > Am 14.06.21 um 19:10 schrieb Marek Ol=C5=A1=C3=A1k: > > > The call to the hw scheduler has a limitation on the size of all > > > parameters combined. I think we can only pass a 32-bit sequence numbe= r > > > and a ~16-bit global (per-GPU) syncobj handle in one call and not muc= h > > > else. > > > > > > The syncobj handle can be an element index in a global (per-GPU) > syncobj > > > table and it's read only for all processes with the exception of the > > > signal command. Syncobjs can either have per VMID write access flags > for > > > the signal command (slow), or any process can write to any syncobjs a= nd > > > only rely on the kernel checking the write log (fast). > > > > > > In any case, we can execute the memory write in the queue engine and > > > only use the hw scheduler for logging, which would be perfect. > > > > > > Marek > > > > > > On Thu, Jun 10, 2021 at 12:33 PM Christian K=C3=B6nig > > > > > > wrote: > > > > > > Hi guys, > > > > > > maybe soften that a bit. Reading from the shared memory of the > > > user fence is ok for everybody. What we need to take more care of > > > is the writing side. > > > > > > So my current thinking is that we allow read only access, but > > > writing a new sequence value needs to go through the > scheduler/kernel. > > > > > > So when the CPU wants to signal a timeline fence it needs to call > > > an IOCTL. When the GPU wants to signal the timeline fence it need= s > > > to hand that of to the hardware scheduler. > > > > > > If we lockup the kernel can check with the hardware who did the > > > last write and what value was written. > > > > > > That together with an IOCTL to give out sequence number for > > > implicit sync to applications should be sufficient for the kernel > > > to track who is responsible if something bad happens. > > > > > > In other words when the hardware says that the shader wrote stuff > > > like 0xdeadbeef 0x0 or 0xffffffff into memory we kill the process > > > who did that. > > > > > > If the hardware says that seq - 1 was written fine, but seq is > > > missing then the kernel blames whoever was supposed to write seq. > > > > > > Just pieping the write through a privileged instance should be > > > fine to make sure that we don't run into issues. > > > > > > Christian. > > > > > > Am 10.06.21 um 17:59 schrieb Marek Ol=C5=A1=C3=A1k: > > > > Hi Daniel, > > > > > > > > We just talked about this whole topic internally and we came up > > > > to the conclusion that the hardware needs to understand sync > > > > object handles and have high-level wait and signal operations i= n > > > > the command stream. Sync objects will be backed by memory, but > > > > they won't be readable or writable by processes directly. The > > > > hardware will log all accesses to sync objects and will send th= e > > > > log to the kernel periodically. The kernel will identify > > > > malicious behavior. > > > > > > > > Example of a hardware command stream: > > > > ... > > > > ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequenc= e > > > > number is assigned by the kernel > > > > Draw(); > > > > ImplicitSyncSignalWhenDone(syncObjHandle); > > > > ... > > > > > > > > I'm afraid we have no other choice because of the TLB > > > > invalidation overhead. > > > > > > > > Marek > > > > > > > > > > > > On Wed, Jun 9, 2021 at 2:31 PM Daniel Vetter > > > > wrote: > > > > > > > > On Wed, Jun 09, 2021 at 03:58:26PM +0200, Christian K=C3=B6= nig > wrote: > > > > > Am 09.06.21 um 15:19 schrieb Daniel Vetter: > > > > > > [SNIP] > > > > > > > Yeah, we call this the lightweight and the heavyweigh= t > > > > tlb flush. > > > > > > > > > > > > > > The lighweight can be used when you are sure that you > > > > don't have any of the > > > > > > > PTEs currently in flight in the 3D/DMA engine and you > > > > just need to > > > > > > > invalidate the TLB. > > > > > > > > > > > > > > The heavyweight must be used when you need to > > > > invalidate the TLB *AND* make > > > > > > > sure that no concurrently operation moves new stuff > > > > into the TLB. > > > > > > > > > > > > > > The problem is for this use case we have to use the > > > > heavyweight one. > > > > > > Just for my own curiosity: So the lightweight flush is > > > > only for in-between > > > > > > CS when you know access is idle? Or does that also not > > > > work if userspace > > > > > > has a CS on a dma engine going at the same time because > > > > the tlb aren't > > > > > > isolated enough between engines? > > > > > > > > > > More or less correct, yes. > > > > > > > > > > The problem is a lightweight flush only invalidates the > > > > TLB, but doesn't > > > > > take care of entries which have been handed out to the > > > > different engines. > > > > > > > > > > In other words what can happen is the following: > > > > > > > > > > 1. Shader asks TLB to resolve address X. > > > > > 2. TLB looks into its cache and can't find address X so i= t > > > > asks the walker > > > > > to resolve. > > > > > 3. Walker comes back with result for address X and TLB pu= ts > > > > that into its > > > > > cache and gives it to Shader. > > > > > 4. Shader starts doing some operation using result for > > > > address X. > > > > > 5. You send lightweight TLB invalidate and TLB throws awa= y > > > > cached values for > > > > > address X. > > > > > 6. Shader happily still uses whatever the TLB gave to it = in > > > > step 3 to > > > > > accesses address X > > > > > > > > > > See it like the shader has their own 1 entry L0 TLB cache > > > > which is not > > > > > affected by the lightweight flush. > > > > > > > > > > The heavyweight flush on the other hand sends out a > > > > broadcast signal to > > > > > everybody and only comes back when we are sure that an > > > > address is not in use > > > > > any more. > > > > > > > > Ah makes sense. On intel the shaders only operate in VA, > > > > everything goes > > > > around as explicit async messages to IO blocks. So we don't > > > > have this, the > > > > only difference in tlb flushes is between tlb flush in the = IB > > > > and an mmio > > > > one which is independent for anything currently being > > > > executed on an > > > > egine. > > > > -Daniel > > > > -- Daniel Vetter > > > > Software Engineer, Intel Corporation > > > > http://blog.ffwll.ch > > > > > > > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > --0000000000007ae2c505c4fa604d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The kernel will know who should touch the implicit-sy= nc semaphore next, and at the same time, the copy of all write requests to = the implicit-sync semaphore will be forwarded to the kernel for monitoring = and bo_wait.

Syncobjs could either use the same mo= nitored access as implicit sync or be completely unmonitored. We haven'= t decided yet.

Syncfiles could either use one of t= he above or wait for a syncobj to go idle before converting to a syncfile.<= /div>

Marek



On Th= u, Jun 17, 2021 at 12:48 PM Daniel Vetter <daniel@ffwll.ch> wrote:
On Mon, Jun 14, 2021 at 07:13:00PM +0200, Christian K= =C3=B6nig wrote:
> As long as we can figure out who touched to a certain sync object last= that
> would indeed work, yes.

Don't you need to know who will touch it next, i.e. who is holding up y= our
fence? Or maybe I'm just again totally confused.
-Daniel

>
> Christian.
>
> Am 14.06.21 um 19:10 schrieb Marek Ol=C5=A1=C3=A1k:
> > The call to the hw scheduler has a limitation on the size of all<= br> > > parameters combined. I think we can only pass a 32-bit sequence n= umber
> > and a ~16-bit global (per-GPU) syncobj handle in one call and not= much
> > else.
> >
> > The syncobj handle can be an element index in a global (per-GPU) = syncobj
> > table and it's read only for all processes with the exception= of the
> > signal command. Syncobjs can either have per VMID write access fl= ags for
> > the signal command (slow), or any process can write to any syncob= js and
> > only rely on the kernel checking the write log (fast).
> >
> > In any case, we can execute the memory write in the queue engine = and
> > only use the hw scheduler for logging, which would be perfect. > >
> > Marek
> >
> > On Thu, Jun 10, 2021 at 12:33 PM Christian K=C3=B6nig
> > <ckoenig.leichtzumerken@gmail.com
> > <mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
> >
> >=C2=A0 =C2=A0 =C2=A0Hi guys,
> >
> >=C2=A0 =C2=A0 =C2=A0maybe soften that a bit. Reading from the shar= ed memory of the
> >=C2=A0 =C2=A0 =C2=A0user fence is ok for everybody. What we need t= o take more care of
> >=C2=A0 =C2=A0 =C2=A0is the writing side.
> >
> >=C2=A0 =C2=A0 =C2=A0So my current thinking is that we allow read o= nly access, but
> >=C2=A0 =C2=A0 =C2=A0writing a new sequence value needs to go throu= gh the scheduler/kernel.
> >
> >=C2=A0 =C2=A0 =C2=A0So when the CPU wants to signal a timeline fen= ce it needs to call
> >=C2=A0 =C2=A0 =C2=A0an IOCTL. When the GPU wants to signal the tim= eline fence it needs
> >=C2=A0 =C2=A0 =C2=A0to hand that of to the hardware scheduler.
> >
> >=C2=A0 =C2=A0 =C2=A0If we lockup the kernel can check with the har= dware who did the
> >=C2=A0 =C2=A0 =C2=A0last write and what value was written.
> >
> >=C2=A0 =C2=A0 =C2=A0That together with an IOCTL to give out sequen= ce number for
> >=C2=A0 =C2=A0 =C2=A0implicit sync to applications should be suffic= ient for the kernel
> >=C2=A0 =C2=A0 =C2=A0to track who is responsible if something bad h= appens.
> >
> >=C2=A0 =C2=A0 =C2=A0In other words when the hardware says that the= shader wrote stuff
> >=C2=A0 =C2=A0 =C2=A0like 0xdeadbeef 0x0 or 0xffffffff into memory = we kill the process
> >=C2=A0 =C2=A0 =C2=A0who did that.
> >
> >=C2=A0 =C2=A0 =C2=A0If the hardware says that seq - 1 was written = fine, but seq is
> >=C2=A0 =C2=A0 =C2=A0missing then the kernel blames whoever was sup= posed to write seq.
> >
> >=C2=A0 =C2=A0 =C2=A0Just pieping the write through a privileged in= stance should be
> >=C2=A0 =C2=A0 =C2=A0fine to make sure that we don't run into i= ssues.
> >
> >=C2=A0 =C2=A0 =C2=A0Christian.
> >
> >=C2=A0 =C2=A0 =C2=A0Am 10.06.21 um 17:59 schrieb Marek Ol=C5=A1=C3= =A1k:
> > >=C2=A0 =C2=A0 =C2=A0Hi Daniel,
> > >
> > >=C2=A0 =C2=A0 =C2=A0We just talked about this whole topic int= ernally and we came up
> > >=C2=A0 =C2=A0 =C2=A0to the conclusion that the hardware needs= to understand sync
> > >=C2=A0 =C2=A0 =C2=A0object handles and have high-level wait a= nd signal operations in
> > >=C2=A0 =C2=A0 =C2=A0the command stream. Sync objects will be = backed by memory, but
> > >=C2=A0 =C2=A0 =C2=A0they won't be readable or writable by= processes directly. The
> > >=C2=A0 =C2=A0 =C2=A0hardware will log all accesses to sync ob= jects and will send the
> > >=C2=A0 =C2=A0 =C2=A0log to the kernel periodically. The kerne= l will identify
> > >=C2=A0 =C2=A0 =C2=A0malicious behavior.
> > >
> > >=C2=A0 =C2=A0 =C2=A0Example of a hardware command stream:
> > >=C2=A0 =C2=A0 =C2=A0...
> > >=C2=A0 =C2=A0 =C2=A0ImplicitSyncWait(syncObjHandle, sequenceN= umber); // the sequence
> > >=C2=A0 =C2=A0 =C2=A0number is assigned by the kernel
> > >=C2=A0 =C2=A0 =C2=A0Draw();
> > >=C2=A0 =C2=A0 =C2=A0ImplicitSyncSignalWhenDone(syncObjHandle)= ;
> > >=C2=A0 =C2=A0 =C2=A0...
> > >
> > >=C2=A0 =C2=A0 =C2=A0I'm afraid we have no other choice be= cause of the TLB
> > >=C2=A0 =C2=A0 =C2=A0invalidation overhead.
> > >
> > >=C2=A0 =C2=A0 =C2=A0Marek
> > >
> > >
> > >=C2=A0 =C2=A0 =C2=A0On Wed, Jun 9, 2021 at 2:31 PM Daniel Vet= ter <daniel@ffwll.c= h
> > >=C2=A0 =C2=A0 =C2=A0<mailto:daniel@ffwll.ch>> wrote:
> > >
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0On Wed, Jun 09, 2021 at 03:= 58:26PM +0200, Christian K=C3=B6nig wrote:
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> Am 09.06.21 um 15:19 s= chrieb Daniel Vetter:
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > [SNIP]
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > > Yeah, we cal= l this the lightweight and the heavyweight
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0tlb flush.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > >
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > > The lighweig= ht can be used when you are sure that you
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0don't have any of the > > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > > PTEs current= ly in flight in the 3D/DMA engine and you
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0just need to
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > > invalidate t= he TLB.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > >
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > > The heavywei= ght must be used when you need to
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0invalidate the TLB *AND* ma= ke
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > > sure that no= concurrently operation moves new stuff
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0into the TLB.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > >
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > > The problem = is for this use case we have to use the
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0heavyweight one.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > Just for my own c= uriosity: So the lightweight flush is
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0only for in-between
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > CS when you know = access is idle? Or does that also not
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0work if userspace
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > has a CS on a dma= engine going at the same time because
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the tlb aren't
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> > isolated enough b= etween engines?
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> More or less correct, = yes.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> The problem is a light= weight flush only invalidates the
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TLB, but doesn't
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> take care of entries w= hich have been handed out to the
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0different engines.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> In other words what ca= n happen is the following:
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> 1. Shader asks TLB to = resolve address X.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> 2. TLB looks into its = cache and can't find address X so it
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0asks the walker
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> to resolve.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> 3. Walker comes back w= ith result for address X and TLB puts
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0that into its
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> cache and gives it to = Shader.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> 4. Shader starts doing= some operation using result for
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0address X.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> 5. You send lightweigh= t TLB invalidate and TLB throws away
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cached values for
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> address X.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> 6. Shader happily stil= l uses whatever the TLB gave to it in
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0step 3 to
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> accesses address X
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> See it like the shader= has their own 1 entry L0 TLB cache
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0which is not
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> affected by the lightw= eight flush.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0>
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> The heavyweight flush = on the other hand sends out a
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0broadcast signal to
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> everybody and only com= es back when we are sure that an
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0address is not in use
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0> any more.
> > >
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Ah makes sense. On intel th= e shaders only operate in VA,
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0everything goes
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0around as explicit async me= ssages to IO blocks. So we don't
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0have this, the
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0only difference in tlb flus= hes is between tlb flush in the IB
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0and an mmio
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0one which is independent fo= r anything currently being
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0executed on an
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0egine.
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-Daniel
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0--=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0Daniel Vetter
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Software Engineer, Intel Co= rporation
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0http://blog.ffwll.ch <http://b= log.ffwll.ch>
> > >
> >
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http:= //blog.ffwll.ch
--0000000000007ae2c505c4fa604d--