Linux-bcache Archive mirror
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Mingzhe Zou <mingzhe.zou@easystack.cn>
Cc: bcache@lists.ewheeler.net, linux-bcache@vger.kernel.org,
	zoumingzhe@qq.com, stable@vger.kernel.org
Subject: Re: [PATCH v2] bcache: fixup lock c->root error
Date: Fri, 8 Sep 2023 18:26:17 +0800	[thread overview]
Message-ID: <moxkbitg22zh4mkfnkhyxyj5eeiqkjvpr6aw7yx5vperpygn5g@fqyaitlpyhpo> (raw)
In-Reply-To: <20230908034108.405-1-mingzhe.zou@easystack.cn>

On Fri, Sep 08, 2023 at 11:41:08AM +0800, Mingzhe Zou wrote:
> We had a problem with io hung because it was waiting for c->root to
> release the lock.
> 
> crash> cache_set.root -l cache_set.list ffffa03fde4c0050
>   root = 0xffff802ef454c800
> crash> btree -o 0xffff802ef454c800 | grep rw_semaphore
>   [ffff802ef454c858] struct rw_semaphore lock;
> crash> struct rw_semaphore ffff802ef454c858
> struct rw_semaphore {
>   count = {
>     counter = -4294967297
>   },
>   wait_list = {
>     next = 0xffff00006786fc28,
>     prev = 0xffff00005d0efac8
>   },
>   wait_lock = {
>     raw_lock = {
>       {
>         val = {
>           counter = 0
>         },
>         {
>           locked = 0 '\000',
>           pending = 0 '\000'
>         },
>         {
>           locked_pending = 0,
>           tail = 0
>         }
>       }
>     }
>   },
>   osq = {
>     tail = {
>       counter = 0
>     }
>   },
>   owner = 0xffffa03fdc586603
> }
> 
> The "counter = -4294967297" means that lock count is -1 and a write lock
> is being attempted. Then, we found that there is a btree with a counter
> of 1 in btree_cache_freeable.
> 
> crash> cache_set -l cache_set.list ffffa03fde4c0050 -o|grep btree_cache
>   [ffffa03fde4c1140] struct list_head btree_cache;
>   [ffffa03fde4c1150] struct list_head btree_cache_freeable;
>   [ffffa03fde4c1160] struct list_head btree_cache_freed;
>   [ffffa03fde4c1170] unsigned int btree_cache_used;
>   [ffffa03fde4c1178] wait_queue_head_t btree_cache_wait;
>   [ffffa03fde4c1190] struct task_struct *btree_cache_alloc_lock;
> crash> list -H ffffa03fde4c1140|wc -l
> 973
> crash> list -H ffffa03fde4c1150|wc -l
> 1123
> crash> cache_set.btree_cache_used -l cache_set.list ffffa03fde4c0050
>   btree_cache_used = 2097
> crash> list -s btree -l btree.list -H ffffa03fde4c1140|grep -E -A2 "^  lock = {" > btree_cache.txt
> crash> list -s btree -l btree.list -H ffffa03fde4c1150|grep -E -A2 "^  lock = {" > btree_cache_freeable.txt
> [root@node-3 127.0.0.1-2023-08-04-16:40:28]# pwd
> /var/crash/127.0.0.1-2023-08-04-16:40:28
> [root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache.txt|grep counter|grep -v "counter = 0"
> [root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache_freeable.txt|grep counter|grep -v "counter = 0"
>       counter = 1
> 
> We found that this is a bug in bch_sectors_dirty_init() when locking c->root:
>     (1). Thread X has locked c->root(A) write.
>     (2). Thread Y failed to lock c->root(A), waiting for the lock(c->root A).
>     (3). Thread X bch_btree_set_root() changes c->root from A to B.
>     (4). Thread X releases the lock(c->root A).
>     (5). Thread Y successfully locks c->root(A).
>     (6). Thread Y releases the lock(c->root B).
> 
>         down_write locked ---(1)----------------------┐
>                 |                                     |
>                 |   down_read waiting ---(2)----┐     |
>                 |           |               ┌-------------┐ ┌-------------┐
>         bch_btree_set_root ===(3)========>> | c->root   A | | c->root   B |
>                 |           |               └-------------┘ └-------------┘
>             up_write ---(4)---------------------┘     |            |
>                             |                         |            |
>                     down_read locked ---(5)-----------┘            |
>                             |                                      |
>                         up_read ---(6)-----------------------------┘
> 
> Since c->root may change, the correct steps to lock c->root should be
> the same as bch_root_usage(), compare after locking.
> 
> static unsigned int bch_root_usage(struct cache_set *c)
> {
>         unsigned int bytes = 0;
>         struct bkey *k;
>         struct btree *b;
>         struct btree_iter iter;
> 
>         goto lock_root;
> 
>         do {
>                 rw_unlock(false, b);
> lock_root:
>                 b = c->root;
>                 rw_lock(false, b, b->level);
>         } while (b != c->root);
> 
>         for_each_key_filter(&b->keys, k, &iter, bch_ptr_bad)
>                 bytes += bkey_bytes(k);
> 
>         rw_unlock(false, b);
> 
>         return (bytes * 100) / btree_bytes(c);
> }
> 
> Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
> Cc: stable@vger.kernel.org

Nice catch, added into my for-next queue.

Thanks for fixing up.

Coly Li


> ---
>  drivers/md/bcache/writeback.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 24c049067f61..bac916ba08c8 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -977,14 +977,22 @@ static int bch_btre_dirty_init_thread_nr(void)
>  void bch_sectors_dirty_init(struct bcache_device *d)
>  {
>  	int i;
> +	struct btree *b = NULL;
>  	struct bkey *k = NULL;
>  	struct btree_iter iter;
>  	struct sectors_dirty_init op;
>  	struct cache_set *c = d->c;
>  	struct bch_dirty_init_state state;
>  
> +retry_lock:
> +	b = c->root;
> +	rw_lock(0, b, b->level);
> +	if (b != c->root) {
> +		rw_unlock(0, b);
> +		goto retry_lock;
> +	}
> +
>  	/* Just count root keys if no leaf node */
> -	rw_lock(0, c->root, c->root->level);
>  	if (c->root->level == 0) {
>  		bch_btree_op_init(&op.op, -1);
>  		op.inode = d->id;
> @@ -994,7 +1002,7 @@ void bch_sectors_dirty_init(struct bcache_device *d)
>  				    k, &iter, bch_ptr_invalid)
>  			sectors_dirty_init_fn(&op.op, c->root, k);
>  
> -		rw_unlock(0, c->root);
> +		rw_unlock(0, b);
>  		return;
>  	}
>  
> @@ -1030,7 +1038,7 @@ void bch_sectors_dirty_init(struct bcache_device *d)
>  out:
>  	/* Must wait for all threads to stop. */
>  	wait_event(state.wait, atomic_read(&state.started) == 0);
> -	rw_unlock(0, c->root);
> +	rw_unlock(0, b);
>  }
>  
>  void bch_cached_dev_writeback_init(struct cached_dev *dc)
> -- 
> 2.17.1.windows.2
> 

-- 
Coly Li

      reply	other threads:[~2023-09-08 10:28 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08  3:41 [PATCH v2] bcache: fixup lock c->root error Mingzhe Zou
2023-09-08 10:26 ` Coly Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=moxkbitg22zh4mkfnkhyxyj5eeiqkjvpr6aw7yx5vperpygn5g@fqyaitlpyhpo \
    --to=colyli@suse.de \
    --cc=bcache@lists.ewheeler.net \
    --cc=linux-bcache@vger.kernel.org \
    --cc=mingzhe.zou@easystack.cn \
    --cc=stable@vger.kernel.org \
    --cc=zoumingzhe@qq.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).