Linux-Block Archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Marco Elver <elver@google.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
	Breno Leitao <leitao@debian.org>, Jens Axboe <axboe@kernel.dk>,
	"open list:BLOCK LAYER" <linux-block@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()
Date: Wed, 15 May 2024 08:57:44 -0700	[thread overview]
Message-ID: <75421237-4c5a-48bc-849e-87a216ee9d32@paulmck-laptop> (raw)
In-Reply-To: <CANpmjNMqRUNUs1mZEhrOSyK0Hk+PdGOi+VAs22qYD+1zTkwfhg@mail.gmail.com>

On Wed, May 15, 2024 at 09:58:35AM +0200, Marco Elver wrote:
> On Wed, 15 May 2024 at 01:47, Paul E. McKenney <paulmck@kernel.org> wrote:
> > On Mon, May 13, 2024 at 10:13:49AM +0200, Marco Elver wrote:
> > > On Sat, 11 May 2024 at 02:41, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > [...]
> > > > ------------------------------------------------------------------------
> > > >
> > > > commit 930cb5f711443d8044e88080ee21b0a5edda33bd
> > > > Author: Paul E. McKenney <paulmck@kernel.org>
> > > > Date:   Fri May 10 15:36:57 2024 -0700
> > > >
> > > >     kcsan: Add example to data_race() kerneldoc header
> > > >
> > > >     Although the data_race() kerneldoc header accurately states what it does,
> > > >     some of the implications and usage patterns are non-obvious.  Therefore,
> > > >     add a brief locking example and also state how to have KCSAN ignore
> > > >     accesses while also preventing the compiler from folding, spindling,
> > > >     or otherwise mutilating the access.
> > > >
> > > >     [ paulmck: Apply Bart Van Assche feedback. ]
> > > >
> > > >     Reported-by: Bart Van Assche <bvanassche@acm.org>
> > > >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > >     Cc: Marco Elver <elver@google.com>
> > > >     Cc: Breno Leitao <leitao@debian.org>
> > > >     Cc: Jens Axboe <axboe@kernel.dk>
> > > >
> > > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > > > index c00cc6c0878a1..9249768ec7a26 100644
> > > > --- a/include/linux/compiler.h
> > > > +++ b/include/linux/compiler.h
> > > > @@ -194,9 +194,17 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> > > >   * This data_race() macro is useful for situations in which data races
> > > >   * should be forgiven.  One example is diagnostic code that accesses
> > > >   * shared variables but is not a part of the core synchronization design.
> > > > + * For example, if accesses to a given variable are protected by a lock,
> > > > + * except for diagnostic code, then the accesses under the lock should
> > > > + * be plain C-language accesses and those in the diagnostic code should
> > > > + * use data_race().  This way, KCSAN will complain if buggy lockless
> > > > + * accesses to that variable are introduced, even if the buggy accesses
> > > > + * are protected by READ_ONCE() or WRITE_ONCE().
> > > >   *
> > > > - * This macro *does not* affect normal code generation, but is a hint
> > > > - * to tooling that data races here are to be ignored.
> > > > + * This macro *does not* affect normal code generation, but is a hint to
> > > > + * tooling that data races here are to be ignored.  If code generation must
> > > > + * be protected *and* KCSAN should ignore the access, use both data_race()
> > >
> > > "code generation must be protected" seems ambiguous, because
> > > protecting code generation could mean a lot of different things to
> > > different people.
> > >
> > > The more precise thing would be to write that "If the access must be
> > > atomic *and* KCSAN should ignore the access, use both ...".
> >
> > Good point, and I took your wording, thank you.
> >
> > > I've also had trouble in the past referring to "miscompilation" or
> > > similar, because that also entirely depends on the promised vs.
> > > expected semantics: if the code in question assumes for the access to
> > > be atomic, the compiler compiling the code in a way that the access is
> > > no longer atomic would be a "miscompilation". Although is it still a
> > > "miscompilation" if the compiler generated code according to specified
> > > language semantics (say according to our own LKMM) - and that's where
> > > opinions can diverge because it depends on which side we stand
> > > (compiler vs. our code).
> >
> > Agreed, use of words like "miscompilation" can annoy people.  What
> > word would you suggest using instead?
> 
> Not sure. As suggested above, I try to just stick to "atomic" vs
> "non-atomic" because that's ultimately the functional end result of
> such a miscompilation. Although I've also had people be confused as in
> "what atomic?! as in atomic RMW?!", but I don't know how to remove
> that kind of confusion.
> 
> If, however, our intended model is the LKMM and e.g. a compiler breaks
> a dependency-chain, then we could talk about miscompilation, because
> the compiler violates our desired language semantics. Of course the
> compiler writers then will say that we try to do things that are
> outside any implemented language semantics the compiler is aware of,
> so it's not a miscompilation again. So it all depends on which side
> we're arguing for. Fun times.

;-) ;-) ;-)

> > > > + * and READ_ONCE(), for example, data_race(READ_ONCE(x)).
> > >
> > > Having more documentation sounds good to me, thanks for adding!
> > >
> > > This extra bit of documentation also exists in a longer form in
> > > access-marking.txt, correct? I wonder how it would be possible to
> > > refer to it, in case the reader wants to learn even more.
> >
> > Good point, especially given that I had forgotten about it.
> >
> > I don't have any immediate ideas for calling attention to this file,
> > but would the following update be helpful?
> 
> Mentioning __data_racy along with data_race() could be helpful, thank
> you. See comments below.

I did add a mention of it in "Linux-Kernel RCU Shared-Variable Marking"
[1], but just a mention, given that I do not expect that we will use it
within RCU.

> Thanks,
> -- Marco
> 
> >                                                         Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
> > index 65778222183e3..690dd59b7ac59 100644
> > --- a/tools/memory-model/Documentation/access-marking.txt
> > +++ b/tools/memory-model/Documentation/access-marking.txt
> > @@ -24,6 +24,12 @@ The Linux kernel provides the following access-marking options:
> >  4.     WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
> >         The various forms of atomic_set() also fit in here.
> >
> > +5.     ASSERT_EXCLUSIVE_ACCESS() and ASSERT_EXCLUSIVE_WRITER().
> 
> Perhaps worth mentioning, but they aren't strictly access-marking
> options. In the interest of simplicity could leave it out.

Interestingly enough, they can be said to be implicitly marking other
concurrent accesses to the variable.  ;-)

I believe that the do need to be mentioned more prominently, though.

Maybe a second list following this one?  In that case, what do we name
the list?  I suppose the other alternative would be to leave them in
this list, and change the preceding sentence to say something like
"The Linux kernel provides the following access-marking-related primitives"

Thoughts?

> > +6.     The volatile keyword, for example, "int volatile a;"
> 
> See below - I'm not sure we should mention volatile. It may set the
> wrong example.

Good point.  I removed this item from this list.

> > +7.     __data_racy, for example "int __data_racy a;"
> > +
> >
> >  These may be used in combination, as shown in this admittedly improbable
> >  example:
> > @@ -205,6 +211,27 @@ because doing otherwise prevents KCSAN from detecting violations of your
> >  code's synchronization rules.
> >
> >
> > +Use of volatile and __data_racy
> > +-------------------------------
> > +
> > +Adding the volatile keyword to the declaration of a variable causes both
> > +the compiler and KCSAN to treat all reads from that variable as if they
> > +were protected by READ_ONCE() and all writes to that variable as if they
> > +were protected by WRITE_ONCE().
> 
> "volatile" isn't something we encourage, right? In which case, I think
> to avoid confusion we should not mention volatile. After all we have
> this: Documentation/process/volatile-considered-harmful.rst

Good point, I removed this paragraph.  But we do sometimes use volatile,
for example for atomic_t and jiffies.  Nevertheless, agreed, we don't
want to encourage it and additions of this keyword should be subjected
to serious scrutiny.

> > +Adding the __data_racy type qualifier to the declaration of a variable
> > +causes KCSAN to treat all accesses to that variable as if they were
> > +enclosed by data_race().  However, __data_racy does not affect the
> > +compiler, though one could imagine hardened kernel builds treating the
> > +__data_racy type qualifier as if it was the volatile keyword.
> > +
> > +Note well that __data_racy is subject to the same pointer-declaration
> > +rules as is the volatile keyword.  For example:
> 
> To avoid referring to volatile could make it more general and say "..
> rules as other type qualifiers like const and volatile".

Very good, thank you!  I happily took your wording.

							Thanx, Paul

> > +       int __data_racy *p; // Pointer to data-racy data.
> > +       int *__data_racy p; // Data-racy pointer to non-data-racy data.
> > +
> > +
> >  ACCESS-DOCUMENTATION OPTIONS
> >  ============================
> >
> > @@ -342,7 +369,7 @@ as follows:
> >
> >  Because foo is read locklessly, all accesses are marked.  The purpose
> >  of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy
> > -concurrent lockless write.
> > +concurrent write, whether marked or not.
> >
> >
> >  Lock-Protected Writes With Heuristic Lockless Reads

  parent reply	other threads:[~2024-05-15 15:57 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-10 14:19 [PATCH] block: Annotate a racy read in blk_do_io_stat() Breno Leitao
2024-05-10 14:28 ` Bart Van Assche
2024-05-10 14:57   ` Breno Leitao
2024-05-10 15:41   ` Paul E. McKenney
2024-05-10 16:20     ` Bart Van Assche
2024-05-10 17:08       ` Paul E. McKenney
2024-05-10 20:30         ` Bart Van Assche
2024-05-10 22:35           ` Paul E. McKenney
2024-05-10 23:22             ` Bart Van Assche
2024-05-11  0:41               ` Paul E. McKenney
2024-05-13  8:13                 ` Marco Elver
2024-05-14 23:47                   ` Paul E. McKenney
2024-05-15  7:58                     ` Marco Elver
2024-05-15 12:48                       ` Breno Leitao
2024-05-15 13:20                         ` Marco Elver
2024-05-15 15:57                       ` Paul E. McKenney [this message]
2024-05-15 17:40                         ` Marco Elver
2024-05-15 21:51                           ` Paul E. McKenney
2024-05-16  6:35                             ` Marco Elver
2024-05-20 18:05                               ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75421237-4c5a-48bc-849e-87a216ee9d32@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=elver@google.com \
    --cc=leitao@debian.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).