From: Eric Wheeler <bcache@lists.ewheeler.net>
To: Coly Li <colyli@suse.de>
Cc: Adriano Silva <adriano_da_silva@yahoo.com.br>,
Bcache Linux <linux-bcache@vger.kernel.org>,
Martin McClure <martin.mcclure@gemtalksystems.com>
Subject: Re: Writeback cache all used.
Date: Sat, 13 May 2023 14:05:26 -0700 (PDT) [thread overview]
Message-ID: <b33952ec-da5d-ad1-c710-ac374955215@ewheeler.net> (raw)
In-Reply-To: <pcwgnlftmhdpqt7byqfinxrjif7laqchpkhv2a4kae46u2mbrf@7eqa2sq6xbzx>
[-- Attachment #1: Type: text/plain, Size: 7024 bytes --]
On Fri, 12 May 2023, Coly Li wrote:
> On Thu, May 11, 2023 at 04:10:51PM -0700, Eric Wheeler wrote:
> > On Tue, 9 May 2023, Adriano Silva wrote:
> > > I got the parameters with this script, although I also checked / sys, doing the math everything is right.
> > >
> > > https://gist.github.com/damoxc/6267899
> >
> > Thanks. prio_stats gives me what I'm looking for. More below.
> >
> > > Em segunda-feira, 8 de maio de 2023 às 21:42:26 BRT, Eric Wheeler <bcache@lists.ewheeler.net> escreveu:
> > > On Thu, 4 May 2023, Coly Li wrote:
> > > > > 2023年5月3日 04:34,Eric Wheeler <bcache@lists.ewheeler.net> 写道:
> > > > >
> > > > > On Thu, 20 Apr 2023, Adriano Silva wrote:
> > > > >> I continue to investigate the situation. There is actually a performance
> > > > >> gain when the bcache device is only half filled versus full. There is a
> > > > >> reduction and greater stability in the latency of direct writes and this
> > > > >> improves my scenario.
> > > > >
> > > > > Hi Coly, have you been able to look at this?
> > > > >
> > > > > This sounds like a great optimization and Adriano is in a place to test
> > > > > this now and report his findings.
> > > > >
> > > > > I think you said this should be a simple hack to add early reclaim, so
> > > > > maybe you can throw a quick patch together (even a rough first-pass with
> > > > > hard-coded reclaim values)
> > > > >
> > > > > If we can get back to Adriano quickly then he can test while he has an
> > > > > easy-to-reproduce environment. Indeed, this could benefit all bcache
> > > > > users.
> > > >
> > > > My current to-do list on hand is a little bit long. Yes I’d like and
> > > > plan to do it, but the response time cannot be estimated.
> > >
> > > I understand. Maybe I can put something together if you can provide some
> > > pointers since you are _the_ expert on bcache these days. Here are a few
> > > questions:
> > >
> > > Q's for Coly:
> >
> >
> > It _looks_ like bcache frees buckets while the `ca->free_inc` list is
> > full, but it could go further. Consider this hypothetical:
> >
>
> Hi Eric,
>
> Bcache starts to invalidate bucket when ca->free_inc is full, and selects some
> buckets to be invalidate by the replacement policy. Then continues to call
> bch_invalidate_one_bucket() and pushes the invalidated bucket into ca->free_inc
> until this list is full or no more candidate bucket to invalidate.
Understood. The goal: In an attempt to work around Adriano's performance
issue, we wish to invalidate buckets even after free_inc is full. If we
can keep ~20% of buckets unused (ie, !GC_SECTORS_USED(b) ) then I think it
will fix his issue. That is the theory we wish to test and validate.
> > https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/md/bcache/alloc.c#L179
> >
> > static void invalidate_buckets_lru(struct cache *ca)
> > {
> > ...
>
> the mutex_lock()/unlock may introduce deadlock. Before invadliate_buckets() is
> called, after allocator_wait() returns the mutex lock bucket_lock is held again.
I see what you mean. Maybe the bucket lock is already held; if so then I
don't need to grab it again. For now I've pulled the mutex_lock lines for
discussion.
We only use for_each_bucket() to get a "fuzzy" count of `available`
buckets (pseudo code updated below). It doesn't need to be exact.
Here is some cleaned up and concise pseudo code for discussion (I've not
yet compile tested):
+ int available = 0;
+
+ //mutex_lock(&ca->set->bucket_lock);
+ for_each_bucket(b, ca) {
+ if (GC_MARK(b) == GC_MARK_RECLAIMABLE)
+ available++;
+ }
+ //mutex_unlock(&ca->set->bucket_lock);
+
- while (!fifo_full(&ca->free_inc)) {
+ while (!fifo_full(&ca->free_inc) || available < TARGET_AVAIL_BUCKETS) {
...
bch_invalidate_one_bucket(ca, b); <<<< this does the work
+ available++;
}
Changes from previous post:
- `available` was not incremented, now it is , so now the loop can
terminate.
- Removed the other counters for clarity, we only care about
GC_MARK_RECLAIMABLE for this discussion.
- Ignore locking for now
(TARGET_AVAIL_BUCKETS is a placeholder, ultimately it would be a sysfs
setting, probably a percentage.)
> If ca->free_inc is full, and you still try to invalidate more candidate
> buckets, the following selected bucket (by the heap_pop) will be
> invalidate in bch_invalidate_one_bucket() and pushed into ca->free_inc.
> But now ca->free_inc is full, so next time when invalidate_buckets_lru()
> is called again, this already invalidated bucket will be accessed and
> checked again in for_each_bucket(). This is just a waste of CPU cycles.
True. I was aware of this performance issue when I wrote that; bcache
takes ~1s to iterate for_each_bucket() on my system. Right now we just
want to keep ~20% of buckets completely unused and verify
correctness...and then I can work on hiding the bucket counting overhead
caused by for_each_bucket().
> Further more, __bch_invalidate_one_bucket() will include the bucket's gen number and
> its pin counter. Doing this without pushing the bucket into ca->free_inc, makes me
> feel uncomfortable.
Questions for my understanding:
- Is free_inc just a reserve list such that most buckets are in the heap
after a fresh `make-bcache -C <cdev>`?
- What is the difference between buckets in free_inc and buckets in the
heap? Do they overlap?
I assume you mean this:
void __bch_invalidate_one_bucket(...) {
...
bch_inc_gen(ca, b);
b->prio = INITIAL_PRIO;
atomic_inc(&b->pin);
If I understand the internals of bcache, the `gen` is just a counter that
increments to make the bucket "newer" than another referenced version.
Incrementing the `gen` on an unused bucket should be safe, but please
correct me if I'm wrong here.
I'm not familiar with b->pin, it doesn't appear to be commented in `struct
bucket` and I didn't see it used in bch_allocator_push().
What is b->pin used for?
> > Coly, would this work?
>
> It should work on some kind of workloads, but will introduce complains for other kind of workloads.
As this is written above, I agree. Right now I'm just trying to
understand the code well enough to free buckets preemptively so allocation
doesn't happen during IO. For now please ignore the cost of
for_each_bucket().
> > Can you think of any serious issues with this (besides the fact that
> > for_each_bucket is slow)?
> >
>
> I don't feel this change may help to make bcache invalidate the clean
> buckets without extra cost.
For the example pseudo code above that is true, and for now I am _not_
trying to address performance.
> It is not simple for me to tell a solution without careful thought, this
> is a tradeoff of gain and pay...
Certainly that is the end goal, but first I need to understand the code
well enough to invalidate buckets down to 20% free and still maintain
correctness.
Thanks for your help understaing this.
-Eric
>
> Thanks.
>
> [snipped]
>
> --
> Coly Li
>
prev parent reply other threads:[~2023-05-13 21:05 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1012241948.1268315.1680082721600.ref@mail.yahoo.com>
2023-03-29 9:38 ` Writeback cache all used Adriano Silva
2023-03-29 19:18 ` Eric Wheeler
2023-03-30 1:38 ` Adriano Silva
2023-03-30 4:55 ` Martin McClure
2023-03-31 0:17 ` Adriano Silva
2023-04-02 0:01 ` Eric Wheeler
2023-04-03 7:14 ` Coly Li
2023-04-03 19:27 ` Eric Wheeler
2023-04-04 8:19 ` Coly Li
2023-04-04 20:29 ` Adriano Silva
2023-04-05 13:57 ` Coly Li
2023-04-05 19:24 ` Eric Wheeler
2023-04-05 19:31 ` Adriano Silva
2023-04-06 21:21 ` Eric Wheeler
2023-04-07 3:15 ` Adriano Silva
2023-04-09 16:37 ` Coly Li
2023-04-09 20:14 ` Adriano Silva
2023-04-09 21:07 ` Adriano Silva
2023-04-20 11:35 ` Adriano Silva
2023-05-02 20:34 ` Eric Wheeler
2023-05-04 4:56 ` Coly Li
2023-05-04 14:34 ` Adriano Silva
2023-05-09 0:29 ` Eric Wheeler
2023-05-09 0:42 ` Eric Wheeler
2023-05-09 2:21 ` Adriano Silva
2023-05-11 23:10 ` Eric Wheeler
2023-05-12 5:13 ` Coly Li
2023-05-13 21:05 ` Eric Wheeler [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b33952ec-da5d-ad1-c710-ac374955215@ewheeler.net \
--to=bcache@lists.ewheeler.net \
--cc=adriano_da_silva@yahoo.com.br \
--cc=colyli@suse.de \
--cc=linux-bcache@vger.kernel.org \
--cc=martin.mcclure@gemtalksystems.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).