Linux-bcache Archive mirror
 help / color / mirror / Atom feed
From: Eric Wheeler <bcache@lists.ewheeler.net>
To: Coly Li <colyli@suse.de>
Cc: Adriano Silva <adriano_da_silva@yahoo.com.br>,
	Bcache Linux <linux-bcache@vger.kernel.org>,
	Martin McClure <martin.mcclure@gemtalksystems.com>
Subject: Re: Writeback cache all used.
Date: Sat, 13 May 2023 14:05:26 -0700 (PDT)	[thread overview]
Message-ID: <b33952ec-da5d-ad1-c710-ac374955215@ewheeler.net> (raw)
In-Reply-To: <pcwgnlftmhdpqt7byqfinxrjif7laqchpkhv2a4kae46u2mbrf@7eqa2sq6xbzx>

[-- Attachment #1: Type: text/plain, Size: 7024 bytes --]

On Fri, 12 May 2023, Coly Li wrote:
> On Thu, May 11, 2023 at 04:10:51PM -0700, Eric Wheeler wrote:
> > On Tue, 9 May 2023, Adriano Silva wrote:
> > > I got the parameters with this script, although I also checked / sys, doing the math everything is right.
> > > 
> > > https://gist.github.com/damoxc/6267899
> > 
> > Thanks.  prio_stats gives me what I'm looking for.  More below.
> >  
> > > Em segunda-feira, 8 de maio de 2023 às 21:42:26 BRT, Eric Wheeler <bcache@lists.ewheeler.net> escreveu: 
> > > On Thu, 4 May 2023, Coly Li wrote:
> > > > > 2023年5月3日 04:34,Eric Wheeler <bcache@lists.ewheeler.net> 写道:
> > > > > 
> > > > > On Thu, 20 Apr 2023, Adriano Silva wrote:
> > > > >> I continue to investigate the situation. There is actually a performance 
> > > > >> gain when the bcache device is only half filled versus full. There is a 
> > > > >> reduction and greater stability in the latency of direct writes and this 
> > > > >> improves my scenario.
> > > > > 
> > > > > Hi Coly, have you been able to look at this?
> > > > > 
> > > > > This sounds like a great optimization and Adriano is in a place to test 
> > > > > this now and report his findings.
> > > > > 
> > > > > I think you said this should be a simple hack to add early reclaim, so 
> > > > > maybe you can throw a quick patch together (even a rough first-pass with 
> > > > > hard-coded reclaim values)
> > > > > 
> > > > > If we can get back to Adriano quickly then he can test while he has an 
> > > > > easy-to-reproduce environment.  Indeed, this could benefit all bcache 
> > > > > users.
> > > > 
> > > > My current to-do list on hand is a little bit long. Yes I’d like and 
> > > > plan to do it, but the response time cannot be estimated.
> > > 
> > > I understand.  Maybe I can put something together if you can provide some 
> > > pointers since you are _the_ expert on bcache these days.  Here are a few 
> > > questions:
> > > 
> > > Q's for Coly:
> > 
> > 
> > It _looks_ like bcache frees buckets while the `ca->free_inc` list is 
> > full, but it could go further.  Consider this hypothetical:
> > 
> 
> Hi Eric,
> 
> Bcache starts to invalidate bucket when ca->free_inc is full, and selects some
> buckets to be invalidate by the replacement policy. Then continues to call
> bch_invalidate_one_bucket() and pushes the invalidated bucket into ca->free_inc
> until this list is full or no more candidate bucket to invalidate.

Understood.  The goal:  In an attempt to work around Adriano's performance 
issue, we wish to invalidate buckets even after free_inc is full.  If we 
can keep ~20% of buckets unused (ie, !GC_SECTORS_USED(b) ) then I think it 
will fix his issue.  That is the theory we wish to test and validate.

> > https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/md/bcache/alloc.c#L179
> > 
> > 	static void invalidate_buckets_lru(struct cache *ca)
> > 	{
> > 	...
> 
> the mutex_lock()/unlock may introduce deadlock. Before invadliate_buckets() is
> called, after allocator_wait() returns the mutex lock bucket_lock is held again. 

I see what you mean.  Maybe the bucket lock is already held; if so then I 
don't need to grab it again. For now I've pulled the mutex_lock lines for 
discussion.

We only use for_each_bucket() to get a "fuzzy" count of `available` 
buckets (pseudo code updated below). It doesn't need to be exact.

Here is some cleaned up and concise pseudo code for discussion (I've not 
yet compile tested):

+	int available = 0;
+
+	//mutex_lock(&ca->set->bucket_lock);
+	for_each_bucket(b, ca) {
+		if (GC_MARK(b) == GC_MARK_RECLAIMABLE)
+			available++;
+	}
+	//mutex_unlock(&ca->set->bucket_lock);
+
-	while (!fifo_full(&ca->free_inc)) {
+	while (!fifo_full(&ca->free_inc) || available < TARGET_AVAIL_BUCKETS) {
		...
 		bch_invalidate_one_bucket(ca, b);  <<<< this does the work
+               available++;
	}

Changes from previous post:

  - `available` was not incremented, now it is , so now the loop can 
    terminate.
  - Removed the other counters for clarity, we only care about 
    GC_MARK_RECLAIMABLE for this discussion.
  - Ignore locking for now
  
(TARGET_AVAIL_BUCKETS is a placeholder, ultimately it would be a sysfs 
setting, probably a percentage.)
 
> If ca->free_inc is full, and you still try to invalidate more candidate 
> buckets, the following selected bucket (by the heap_pop) will be 
> invalidate in bch_invalidate_one_bucket() and pushed into ca->free_inc. 
> But now ca->free_inc is full, so next time when invalidate_buckets_lru() 
> is called again, this already invalidated bucket will be accessed and 
> checked again in for_each_bucket(). This is just a waste of CPU cycles.

True. I was aware of this performance issue when I wrote that; bcache 
takes ~1s to iterate for_each_bucket() on my system.  Right now we just 
want to keep ~20% of buckets completely unused and verify 
correctness...and then I can work on hiding the bucket counting overhead 
caused by for_each_bucket().

> Further more, __bch_invalidate_one_bucket() will include the bucket's gen number and
> its pin counter. Doing this without pushing the bucket into ca->free_inc, makes me
> feel uncomfortable.

Questions for my understanding: 

- Is free_inc just a reserve list such that most buckets are in the heap 
  after a fresh `make-bcache -C <cdev>`?

- What is the difference between buckets in free_inc and buckets in the 
  heap? Do they overlap?

I assume you mean this:

	void __bch_invalidate_one_bucket(...) { 
		...
		bch_inc_gen(ca, b);
		b->prio = INITIAL_PRIO;
		atomic_inc(&b->pin);

If I understand the internals of bcache, the `gen` is just a counter that 
increments to make the bucket "newer" than another referenced version.  
Incrementing the `gen` on an unused bucket should be safe, but please 
correct me if I'm wrong here.

I'm not familiar with b->pin, it doesn't appear to be commented in `struct 
bucket` and I didn't see it used in bch_allocator_push().  

What is b->pin used for?

> > Coly, would this work?
> 
> It should work on some kind of workloads, but will introduce complains for other kind of workloads.

As this is written above, I agree.  Right now I'm just trying to 
understand the code well enough to free buckets preemptively so allocation 
doesn't happen during IO.  For now please ignore the cost of 
for_each_bucket().

> > Can you think of any serious issues with this (besides the fact that 
> > for_each_bucket is slow)?
> > 
> 
> I don't feel this change may help to make bcache invalidate the clean 
> buckets without extra cost.

For the example pseudo code above that is true, and for now I am _not_ 
trying to address performance.
 
> It is not simple for me to tell a solution without careful thought, this 
> is a tradeoff of gain and pay...

Certainly that is the end goal, but first I need to understand the code 
well enough to invalidate buckets down to 20% free and still maintain 
correctness.

Thanks for your help understaing this.

-Eric

> 
> Thanks.
> 
> [snipped]
> 
> -- 
> Coly Li
> 

      reply	other threads:[~2023-05-13 21:05 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1012241948.1268315.1680082721600.ref@mail.yahoo.com>
2023-03-29  9:38 ` Writeback cache all used Adriano Silva
2023-03-29 19:18   ` Eric Wheeler
2023-03-30  1:38     ` Adriano Silva
2023-03-30  4:55   ` Martin McClure
2023-03-31  0:17     ` Adriano Silva
2023-04-02  0:01       ` Eric Wheeler
2023-04-03  7:14         ` Coly Li
2023-04-03 19:27           ` Eric Wheeler
2023-04-04  8:19             ` Coly Li
2023-04-04 20:29               ` Adriano Silva
2023-04-05 13:57                 ` Coly Li
2023-04-05 19:24                   ` Eric Wheeler
2023-04-05 19:31                   ` Adriano Silva
2023-04-06 21:21                     ` Eric Wheeler
2023-04-07  3:15                       ` Adriano Silva
2023-04-09 16:37                     ` Coly Li
2023-04-09 20:14                       ` Adriano Silva
2023-04-09 21:07                         ` Adriano Silva
2023-04-20 11:35                           ` Adriano Silva
2023-05-02 20:34                             ` Eric Wheeler
2023-05-04  4:56                               ` Coly Li
2023-05-04 14:34                                 ` Adriano Silva
2023-05-09  0:29                                   ` Eric Wheeler
2023-05-09  0:42                                 ` Eric Wheeler
2023-05-09  2:21                                   ` Adriano Silva
2023-05-11 23:10                                     ` Eric Wheeler
2023-05-12  5:13                                       ` Coly Li
2023-05-13 21:05                                         ` Eric Wheeler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b33952ec-da5d-ad1-c710-ac374955215@ewheeler.net \
    --to=bcache@lists.ewheeler.net \
    --cc=adriano_da_silva@yahoo.com.br \
    --cc=colyli@suse.de \
    --cc=linux-bcache@vger.kernel.org \
    --cc=martin.mcclure@gemtalksystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).