LKML Archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net 0/2] net/smc: Fix for race in smc link group termination
@ 2021-12-28  9:20 Wen Gu
  2021-12-28  9:20 ` [RFC PATCH net 1/2] net/smc: Resolve the race between link group access and termination Wen Gu
  2021-12-28  9:20 ` [RFC PATCH net 2/2] net/smc: Resolve the race between SMC-R link access and clear Wen Gu
  0 siblings, 2 replies; 4+ messages in thread
From: Wen Gu @ 2021-12-28  9:20 UTC (permalink / raw
  To: kgraul, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu

We encountered some crashes recently and they are caused by the
race between the access and free of link/link group in smc link
group termination. The crashes can be reproduced in frequent
abnormal link group termination, like set RNICs up/down.

This set of patches tries to fix this by extending the life cycle
of link/link group to ensure that they won't be referred to after
cleared or freed.

Best wishes,
Wen Gu

Wen Gu (2):
  net/smc: Resolve the race between link group access and termination
  net/smc: Resolve the race between SMC-R link access and clear

 net/smc/smc.h      |  1 +
 net/smc/smc_core.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++------
 net/smc/smc_core.h |  7 +++++
 3 files changed, 79 insertions(+), 9 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH net 1/2] net/smc: Resolve the race between link group access and termination
  2021-12-28  9:20 [RFC PATCH net 0/2] net/smc: Fix for race in smc link group termination Wen Gu
@ 2021-12-28  9:20 ` Wen Gu
  2021-12-28  9:20 ` [RFC PATCH net 2/2] net/smc: Resolve the race between SMC-R link access and clear Wen Gu
  1 sibling, 0 replies; 4+ messages in thread
From: Wen Gu @ 2021-12-28  9:20 UTC (permalink / raw
  To: kgraul, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu

We encountered some crashes caused by the race between the access
and the termination of link groups.

Here are some of panic stacks we met:

1) Race between smc_clc_wait_msg() and __smc_lgr_terminate()

 BUG: kernel NULL pointer dereference, address: 00000000000002f0
 Workqueue: smc_hs_wq smc_listen_work [smc]
 RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc]
 Call Trace:
  <TASK>
  ? smc_clc_send_accept+0x45/0xa0 [smc]
  ? smc_clc_send_accept+0x45/0xa0 [smc]
  smc_listen_work+0x783/0x1220 [smc]
  ? finish_task_switch+0xc4/0x2e0
  ? process_one_work+0x1ad/0x3c0
  process_one_work+0x1ad/0x3c0
  worker_thread+0x4c/0x390
  ? rescuer_thread+0x320/0x320
  kthread+0x149/0x190
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x1f/0x30
  </TASK>

smc_listen_work()                abnormal case like port error
---------------------------------------------------------------
                                | __smc_lgr_terminate()
                                |     |- smc_conn_kill()
                                |            |- smc_lgr_unregister_conn()
                                |                   |- set conn->lgr = NULL
smc_clc_wait_msg()              |
    |- access conn->lgr (panic) |

2) Race between smc_setsockopt() and __smc_lgr_terminate()

 BUG: kernel NULL pointer dereference, address: 00000000000002e8
 RIP: 0010:smc_setsockopt+0x17a/0x280 [smc]
 Call Trace:
  <TASK>
  __sys_setsockopt+0xfc/0x190
  __x64_sys_setsockopt+0x20/0x30
  do_syscall_64+0x34/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  </TASK>

smc_setsockopt()                 abnormal case like port error
--------------------------------------------------------------
                                | __smc_lgr_terminate()
                                |     |- smc_conn_kill()
                                |            |- smc_lgr_unregister_conn()
                                |                   |- set conn->lgr = NULL
mod_delayed_work()              |
    |- access conn->lgr (panic) |

There are some other panic points and they are caused by the
simmilar reason as described above, which is accessing link
group after termination, thus getting a NULL pointer or invalid
resource.

Currently, there seems to be no synchronization between the
link group access and a sudden termination of it. This patch
tries to fix this by introducing reference count of link group
and not freeing link group until reference count is zero.

Link group might be referred to by link or smc connection. So
the operation to the link group reference count can be concluded
as follows:

object          [hold or initialized as 1]         [put]
--------------------------------------------------------------------
link group      smc_lgr_create()                   smc_lgr_free()
connections     smc_lgr_register_conn()            smc_conn_free()
links           smcr_link_init()                   smcr_link_clear()

Througth this way, we extend the life cycle of link group and
ensure it is longer than the life cycle of connections and links
above it, so that avoid invalid access to link group after its
termination.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc.h      |  1 +
 net/smc/smc_core.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
 net/smc/smc_core.h |  3 +++
 3 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/net/smc/smc.h b/net/smc/smc.h
index 1a4fc1c..3d0b8e3 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -221,6 +221,7 @@ struct smc_connection {
 						 */
 	u64			peer_token;	/* SMC-D token of peer */
 	u8			killed : 1;	/* abnormal termination */
+	u8			freed : 1;	/* normal termiation */
 	u8			out_of_sync : 1; /* out of sync with peer */
 };
 
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 1f40b8e..d72eb13 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -184,6 +184,7 @@ static int smc_lgr_register_conn(struct smc_connection *conn, bool first)
 			conn->alert_token_local = 0;
 	}
 	smc_lgr_add_alert_token(conn);
+	smc_lgr_hold(conn->lgr); /* lgr_put in smc_conn_free() */
 	conn->lgr->conns_num++;
 	return 0;
 }
@@ -216,7 +217,6 @@ static void smc_lgr_unregister_conn(struct smc_connection *conn)
 		__smc_lgr_unregister_conn(conn);
 	}
 	write_unlock_bh(&lgr->conns_lock);
-	conn->lgr = NULL;
 }
 
 int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb)
@@ -749,6 +749,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
 	lnk->path_mtu = lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu;
 	lnk->link_id = smcr_next_link_id(lgr);
 	lnk->lgr = lgr;
+	smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */
 	lnk->link_idx = link_idx;
 	smc_ibdev_cnt_inc(lnk);
 	smcr_copy_dev_info_to_link(lnk);
@@ -841,6 +842,7 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini)
 	lgr->terminating = 0;
 	lgr->freeing = 0;
 	lgr->vlan_id = ini->vlan_id;
+	refcount_set(&lgr->refcnt, 1); /* set lgr refcnt to 1 */
 	mutex_init(&lgr->sndbufs_lock);
 	mutex_init(&lgr->rmbs_lock);
 	rwlock_init(&lgr->conns_lock);
@@ -1120,8 +1122,20 @@ void smc_conn_free(struct smc_connection *conn)
 {
 	struct smc_link_group *lgr = conn->lgr;
 
-	if (!lgr)
+	if (!lgr || conn->freed)
+		/* smc connection wasn't registered to a link group
+		 * or has already been freed before.
+		 *
+		 * Judge these to ensure that lgr refcnt will be put
+		 * only once if connection has been registered to a
+		 * link group successfully.
+		 */
 		return;
+
+	conn->freed = 1;
+	if (conn->killed)
+		goto lgr_put;
+
 	if (lgr->is_smcd) {
 		if (!list_empty(&lgr->list))
 			smc_ism_unset_conn(conn);
@@ -1138,6 +1152,8 @@ void smc_conn_free(struct smc_connection *conn)
 
 	if (!lgr->conns_num)
 		smc_lgr_schedule_free_work(lgr);
+lgr_put:
+	smc_lgr_put(lgr); /* lgr_hold in smc_lgr_register_conn() */
 }
 
 /* unregister a link from a buf_desc */
@@ -1209,6 +1225,7 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
 	smc_ib_destroy_queue_pair(lnk);
 	smc_ib_dealloc_protection_domain(lnk);
 	smc_wr_free_link_mem(lnk);
+	smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */
 	smc_ibdev_cnt_dec(lnk);
 	put_device(&lnk->smcibdev->ibdev->dev);
 	smcibdev = lnk->smcibdev;
@@ -1283,6 +1300,15 @@ static void smc_lgr_free_bufs(struct smc_link_group *lgr)
 	__smc_lgr_free_bufs(lgr, true);
 }
 
+/* won't be freed until no one accesses to lgr anymore */
+static void __smc_lgr_free(struct smc_link_group *lgr)
+{
+	smc_lgr_free_bufs(lgr);
+	if (!lgr->is_smcd)
+		smc_wr_free_lgr_mem(lgr);
+	kfree(lgr);
+}
+
 /* remove a link group */
 static void smc_lgr_free(struct smc_link_group *lgr)
 {
@@ -1298,7 +1324,6 @@ static void smc_lgr_free(struct smc_link_group *lgr)
 		smc_llc_lgr_clear(lgr);
 	}
 
-	smc_lgr_free_bufs(lgr);
 	destroy_workqueue(lgr->tx_wq);
 	if (lgr->is_smcd) {
 		smc_ism_put_vlan(lgr->smcd, lgr->vlan_id);
@@ -1306,11 +1331,21 @@ static void smc_lgr_free(struct smc_link_group *lgr)
 		if (!atomic_dec_return(&lgr->smcd->lgr_cnt))
 			wake_up(&lgr->smcd->lgrs_deleted);
 	} else {
-		smc_wr_free_lgr_mem(lgr);
 		if (!atomic_dec_return(&lgr_cnt))
 			wake_up(&lgrs_deleted);
 	}
-	kfree(lgr);
+	smc_lgr_put(lgr); /* theoretically last lgr_put */
+}
+
+void smc_lgr_hold(struct smc_link_group *lgr)
+{
+	refcount_inc(&lgr->refcnt);
+}
+
+void smc_lgr_put(struct smc_link_group *lgr)
+{
+	if (refcount_dec_and_test(&lgr->refcnt))
+		__smc_lgr_free(lgr);
 }
 
 static void smc_sk_wake_ups(struct smc_sock *smc)
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index d63b082..51203b1 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -249,6 +249,7 @@ struct smc_link_group {
 	u8			terminating : 1;/* lgr is terminating */
 	u8			freeing : 1;	/* lgr is being freed */
 
+	refcount_t		refcnt;		/* lgr reference count */
 	bool			is_smcd;	/* SMC-R or SMC-D */
 	u8			smc_version;
 	u8			negotiated_eid[SMC_MAX_EID_LEN];
@@ -470,6 +471,8 @@ static inline void smc_set_pci_values(struct pci_dev *pci_dev,
 
 void smc_lgr_cleanup_early(struct smc_connection *conn);
 void smc_lgr_terminate_sched(struct smc_link_group *lgr);
+void smc_lgr_hold(struct smc_link_group *lgr);
+void smc_lgr_put(struct smc_link_group *lgr);
 void smcr_port_add(struct smc_ib_device *smcibdev, u8 ibport);
 void smcr_port_err(struct smc_ib_device *smcibdev, u8 ibport);
 void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH net 2/2] net/smc: Resolve the race between SMC-R link access and clear
  2021-12-28  9:20 [RFC PATCH net 0/2] net/smc: Fix for race in smc link group termination Wen Gu
  2021-12-28  9:20 ` [RFC PATCH net 1/2] net/smc: Resolve the race between link group access and termination Wen Gu
@ 2021-12-28  9:20 ` Wen Gu
  2021-12-28 14:50   ` Wen Gu
  1 sibling, 1 reply; 4+ messages in thread
From: Wen Gu @ 2021-12-28  9:20 UTC (permalink / raw
  To: kgraul, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu

We encountered some crashes caused by the race between SMC-R
link access and link clear triggered by link group termination
in abnormal case, like port error.

Here are some of panic stacks we met:

1) Race between smc_llc_flow_initiate() and smcr_link_clear()

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 Workqueue: smc_hs_wq smc_listen_work [smc]
 RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc]
 Call Trace:
  <TASK>
  ? __smc_buf_create+0x75a/0x950 [smc]
  smcr_lgr_reg_rmbs+0x2a/0xbf [smc]
  smc_listen_work+0xf72/0x1230 [smc]
  ? process_one_work+0x25c/0x600
  process_one_work+0x25c/0x600
  worker_thread+0x4f/0x3a0
  ? process_one_work+0x600/0x600
  kthread+0x15d/0x1a0
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x1f/0x30
  </TASK>

smc_listen_work()                       __smc_lgr_terminate()
---------------------------------------------------------------
                                       | smc_lgr_free()
                                       |     |- smcr_link_clear()
                                       |            |- memset(lnk, 0)
smc_listen_rdma_reg()                  |
  |- smcr_lgr_reg_rmbs()               |
       |- smc_llc_flow_initiate()      |
            |- access lnk->lgr (panic) |

2) Race between smc_wr_tx_dismiss_slots() and smcr_link_clear()

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 RIP: 0010:_find_first_bit+0x8/0x50
 Call Trace:
  <TASK>
  smc_wr_tx_dismiss_slots+0x34/0xc0 [smc]
  ? smc_cdc_tx_filter+0x10/0x10 [smc]
  smc_conn_free+0xd8/0x100 [smc]
  __smc_release+0xf1/0x140 [smc]
  smc_release+0x89/0x1b0 [smc]
  __sock_release+0x37/0xb0
  sock_close+0x14/0x20
  __fput+0xa9/0x260
  task_work_run+0x6b/0xb0
  do_exit+0x3ef/0xd40
  do_group_exit+0x47/0xb0
  __x64_sys_exit_group+0x14/0x20
  do_syscall_64+0x34/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  </TASK>

smc_conn_free()                           __smc_lgr_terminate()
----------------------------------------------------------------
                                         | smc_lgr_free()
                                         |  |- smcr_link_clear()
                                         |      |- smc_wr_free_link_mem()
                                         |          |- lnk->wr_tx_mask = NULL;
smc_wr_tx_dismiss_slots()                |
  |- for_each_set_bit(link->wr_tx_mask)  |
            |- (panic)                   |

These crashes are caused by clearing SMC-R link resources
when someone is still accessing to them. So this patch tries
to fix it by introducing reference count of SMC-R links and
ensuring that the sensitive resources of links are not
cleared until reference count is zero.

The operation to the SMC-R link reference count can be concluded
as follows:

object          [hold or initialized as 1]         [put]
--------------------------------------------------------------------
links           smcr_link_init()                   smcr_link_clear()
connections     smcr_lgr_conn_assign_link()        smc_conn_free()

Through this way, the clear of SMC-R links is later than the
free of all the smc connections above it, thus avoiding the
unsafe reference to SMC-R links.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_core.c | 43 +++++++++++++++++++++++++++++++++++--------
 net/smc/smc_core.h |  4 ++++
 2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index d72eb13..83a80e6 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -155,6 +155,7 @@ static int smcr_lgr_conn_assign_link(struct smc_connection *conn, bool first)
 	if (!conn->lnk)
 		return SMC_CLC_DECL_NOACTLINK;
 	atomic_inc(&conn->lnk->conn_cnt);
+	smcr_link_hold(conn->lnk); /* link_put in smc_conn_free() */
 	return 0;
 }
 
@@ -746,6 +747,8 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
 	}
 	get_device(&lnk->smcibdev->ibdev->dev);
 	atomic_inc(&lnk->smcibdev->lnk_cnt);
+	refcount_set(&lnk->refcnt, 1); /* link refcnt is set to 1 */
+	lnk->clearing = 0;
 	lnk->path_mtu = lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu;
 	lnk->link_id = smcr_next_link_id(lgr);
 	lnk->lgr = lgr;
@@ -994,8 +997,12 @@ void smc_switch_link_and_count(struct smc_connection *conn,
 			       struct smc_link *to_lnk)
 {
 	atomic_dec(&conn->lnk->conn_cnt);
+	/* put old link, hold in smcr_lgr_conn_assign_link() */
+	smcr_link_put(conn->lnk);
 	conn->lnk = to_lnk;
 	atomic_inc(&conn->lnk->conn_cnt);
+	/* hold new link, put in smc_conn_free() */
+	smcr_link_hold(conn->lnk);
 }
 
 struct smc_link *smc_switch_conns(struct smc_link_group *lgr,
@@ -1126,9 +1133,9 @@ void smc_conn_free(struct smc_connection *conn)
 		/* smc connection wasn't registered to a link group
 		 * or has already been freed before.
 		 *
-		 * Judge these to ensure that lgr refcnt will be put
-		 * only once if connection has been registered to a
-		 * link group successfully.
+		 * Judge these to ensure that lgr/link refcnt will be
+		 * put only once if connection has been registered to
+		 * a link group successfully.
 		 */
 		return;
 
@@ -1153,6 +1160,8 @@ void smc_conn_free(struct smc_connection *conn)
 	if (!lgr->conns_num)
 		smc_lgr_schedule_free_work(lgr);
 lgr_put:
+	if (!lgr->is_smcd)
+		smcr_link_put(conn->lnk); /* link_hold in smcr_lgr_conn_assign_link() */
 	smc_lgr_put(lgr); /* lgr_hold in smc_lgr_register_conn() */
 }
 
@@ -1209,13 +1218,23 @@ static void smcr_rtoken_clear_link(struct smc_link *lnk)
 	}
 }
 
+void __smcr_link_clear(struct smc_link *lnk)
+{
+	smc_wr_free_link_mem(lnk);
+	smc_lgr_put(lnk->lgr);	/* lgr_hold in smcr_link_init() */
+	memset(lnk, 0, sizeof(struct smc_link));
+	lnk->state = SMC_LNK_UNUSED;
+}
+
 /* must be called under lgr->llc_conf_mutex lock */
 void smcr_link_clear(struct smc_link *lnk, bool log)
 {
 	struct smc_ib_device *smcibdev;
 
-	if (!lnk->lgr || lnk->state == SMC_LNK_UNUSED)
+	if (lnk->clearing || !lnk->lgr ||
+	    lnk->state == SMC_LNK_UNUSED)
 		return;
+	lnk->clearing = 1;
 	lnk->peer_qpn = 0;
 	smc_llc_link_clear(lnk, log);
 	smcr_buf_unmap_lgr(lnk);
@@ -1224,15 +1243,23 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
 	smc_wr_free_link(lnk);
 	smc_ib_destroy_queue_pair(lnk);
 	smc_ib_dealloc_protection_domain(lnk);
-	smc_wr_free_link_mem(lnk);
-	smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */
 	smc_ibdev_cnt_dec(lnk);
 	put_device(&lnk->smcibdev->ibdev->dev);
 	smcibdev = lnk->smcibdev;
-	memset(lnk, 0, sizeof(struct smc_link));
-	lnk->state = SMC_LNK_UNUSED;
 	if (!atomic_dec_return(&smcibdev->lnk_cnt))
 		wake_up(&smcibdev->lnks_deleted);
+	smcr_link_put(lnk); /* theoretically last link_put */
+}
+
+void smcr_link_hold(struct smc_link *lnk)
+{
+	refcount_inc(&lnk->refcnt);
+}
+
+void smcr_link_put(struct smc_link *lnk)
+{
+	if (refcount_dec_and_test(&lnk->refcnt))
+		__smcr_link_clear(lnk);
 }
 
 static void smcr_buf_free(struct smc_link_group *lgr, bool is_rmb,
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 51203b1..e73217f 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -137,6 +137,8 @@ struct smc_link {
 	u8			peer_link_uid[SMC_LGR_ID_SIZE]; /* peer uid */
 	u8			link_idx;	/* index in lgr link array */
 	u8			link_is_asym;	/* is link asymmetric? */
+	u8			clearing : 1;	/* link is being cleared */
+	refcount_t		refcnt;		/* link reference count */
 	struct smc_link_group	*lgr;		/* parent link group */
 	struct work_struct	link_down_wrk;	/* wrk to bring link down */
 	char			ibname[IB_DEVICE_NAME_MAX]; /* ib device name */
@@ -504,6 +506,8 @@ void smc_rtoken_set2(struct smc_link_group *lgr, int rtok_idx, int link_id,
 int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
 		   u8 link_idx, struct smc_init_info *ini);
 void smcr_link_clear(struct smc_link *lnk, bool log);
+void smcr_link_hold(struct smc_link *lnk);
+void smcr_link_put(struct smc_link *lnk);
 void smc_switch_link_and_count(struct smc_connection *conn,
 			       struct smc_link *to_lnk);
 int smcr_buf_map_lgr(struct smc_link *lnk);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH net 2/2] net/smc: Resolve the race between SMC-R link access and clear
  2021-12-28  9:20 ` [RFC PATCH net 2/2] net/smc: Resolve the race between SMC-R link access and clear Wen Gu
@ 2021-12-28 14:50   ` Wen Gu
  0 siblings, 0 replies; 4+ messages in thread
From: Wen Gu @ 2021-12-28 14:50 UTC (permalink / raw
  To: kgraul, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu


There is an auto build test WARNING in this patch, function 
__smcr_link_clear() should be declared as 'static'. I will
send a v2 of this RFC patch set.

On 2021/12/28 5:20 pm, Wen Gu wrote:
> We encountered some crashes caused by the race between SMC-R
> link access and link clear triggered by link group termination
> in abnormal case, like port error.
> 
> Here are some of panic stacks we met:
> 
> 1) Race between smc_llc_flow_initiate() and smcr_link_clear()
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000000
>   Workqueue: smc_hs_wq smc_listen_work [smc]
>   RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc]
>   Call Trace:
>    <TASK>
>    ? __smc_buf_create+0x75a/0x950 [smc]
>    smcr_lgr_reg_rmbs+0x2a/0xbf [smc]
>    smc_listen_work+0xf72/0x1230 [smc]
>    ? process_one_work+0x25c/0x600
>    process_one_work+0x25c/0x600
>    worker_thread+0x4f/0x3a0
>    ? process_one_work+0x600/0x600
>    kthread+0x15d/0x1a0
>    ? set_kthread_struct+0x40/0x40
>    ret_from_fork+0x1f/0x30
>    </TASK>
> 
> smc_listen_work()                       __smc_lgr_terminate()
> ---------------------------------------------------------------
>                                         | smc_lgr_free()
>                                         |     |- smcr_link_clear()
>                                         |            |- memset(lnk, 0)
> smc_listen_rdma_reg()                  |
>    |- smcr_lgr_reg_rmbs()               |
>         |- smc_llc_flow_initiate()      |
>              |- access lnk->lgr (panic) |
> 
> 2) Race between smc_wr_tx_dismiss_slots() and smcr_link_clear()
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000000
>   RIP: 0010:_find_first_bit+0x8/0x50
>   Call Trace:
>    <TASK>
>    smc_wr_tx_dismiss_slots+0x34/0xc0 [smc]
>    ? smc_cdc_tx_filter+0x10/0x10 [smc]
>    smc_conn_free+0xd8/0x100 [smc]
>    __smc_release+0xf1/0x140 [smc]
>    smc_release+0x89/0x1b0 [smc]
>    __sock_release+0x37/0xb0
>    sock_close+0x14/0x20
>    __fput+0xa9/0x260
>    task_work_run+0x6b/0xb0
>    do_exit+0x3ef/0xd40
>    do_group_exit+0x47/0xb0
>    __x64_sys_exit_group+0x14/0x20
>    do_syscall_64+0x34/0x90
>    entry_SYSCALL_64_after_hwframe+0x44/0xae
>    </TASK>
> 
> smc_conn_free()                           __smc_lgr_terminate()
> ----------------------------------------------------------------
>                                           | smc_lgr_free()
>                                           |  |- smcr_link_clear()
>                                           |      |- smc_wr_free_link_mem()
>                                           |          |- lnk->wr_tx_mask = NULL;
> smc_wr_tx_dismiss_slots()                |
>    |- for_each_set_bit(link->wr_tx_mask)  |
>              |- (panic)                   |
> 
> These crashes are caused by clearing SMC-R link resources
> when someone is still accessing to them. So this patch tries
> to fix it by introducing reference count of SMC-R links and
> ensuring that the sensitive resources of links are not
> cleared until reference count is zero.
> 
> The operation to the SMC-R link reference count can be concluded
> as follows:
> 
> object          [hold or initialized as 1]         [put]
> --------------------------------------------------------------------
> links           smcr_link_init()                   smcr_link_clear()
> connections     smcr_lgr_conn_assign_link()        smc_conn_free()
> 
> Through this way, the clear of SMC-R links is later than the
> free of all the smc connections above it, thus avoiding the
> unsafe reference to SMC-R links.
> 
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> ---
>   net/smc/smc_core.c | 43 +++++++++++++++++++++++++++++++++++--------
>   net/smc/smc_core.h |  4 ++++
>   2 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index d72eb13..83a80e6 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -155,6 +155,7 @@ static int smcr_lgr_conn_assign_link(struct smc_connection *conn, bool first)
>   	if (!conn->lnk)
>   		return SMC_CLC_DECL_NOACTLINK;
>   	atomic_inc(&conn->lnk->conn_cnt);
> +	smcr_link_hold(conn->lnk); /* link_put in smc_conn_free() */
>   	return 0;
>   }
>   
> @@ -746,6 +747,8 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
>   	}
>   	get_device(&lnk->smcibdev->ibdev->dev);
>   	atomic_inc(&lnk->smcibdev->lnk_cnt);
> +	refcount_set(&lnk->refcnt, 1); /* link refcnt is set to 1 */
> +	lnk->clearing = 0;
>   	lnk->path_mtu = lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu;
>   	lnk->link_id = smcr_next_link_id(lgr);
>   	lnk->lgr = lgr;
> @@ -994,8 +997,12 @@ void smc_switch_link_and_count(struct smc_connection *conn,
>   			       struct smc_link *to_lnk)
>   {
>   	atomic_dec(&conn->lnk->conn_cnt);
> +	/* put old link, hold in smcr_lgr_conn_assign_link() */
> +	smcr_link_put(conn->lnk);
>   	conn->lnk = to_lnk;
>   	atomic_inc(&conn->lnk->conn_cnt);
> +	/* hold new link, put in smc_conn_free() */
> +	smcr_link_hold(conn->lnk);
>   }
>   
>   struct smc_link *smc_switch_conns(struct smc_link_group *lgr,
> @@ -1126,9 +1133,9 @@ void smc_conn_free(struct smc_connection *conn)
>   		/* smc connection wasn't registered to a link group
>   		 * or has already been freed before.
>   		 *
> -		 * Judge these to ensure that lgr refcnt will be put
> -		 * only once if connection has been registered to a
> -		 * link group successfully.
> +		 * Judge these to ensure that lgr/link refcnt will be
> +		 * put only once if connection has been registered to
> +		 * a link group successfully.
>   		 */
>   		return;
>   
> @@ -1153,6 +1160,8 @@ void smc_conn_free(struct smc_connection *conn)
>   	if (!lgr->conns_num)
>   		smc_lgr_schedule_free_work(lgr);
>   lgr_put:
> +	if (!lgr->is_smcd)
> +		smcr_link_put(conn->lnk); /* link_hold in smcr_lgr_conn_assign_link() */
>   	smc_lgr_put(lgr); /* lgr_hold in smc_lgr_register_conn() */
>   }
>   
> @@ -1209,13 +1218,23 @@ static void smcr_rtoken_clear_link(struct smc_link *lnk)
>   	}
>   }
>   
> +void __smcr_link_clear(struct smc_link *lnk)
> +{
> +	smc_wr_free_link_mem(lnk);
> +	smc_lgr_put(lnk->lgr);	/* lgr_hold in smcr_link_init() */
> +	memset(lnk, 0, sizeof(struct smc_link));
> +	lnk->state = SMC_LNK_UNUSED;
> +}
> +
>   /* must be called under lgr->llc_conf_mutex lock */
>   void smcr_link_clear(struct smc_link *lnk, bool log)
>   {
>   	struct smc_ib_device *smcibdev;
>   
> -	if (!lnk->lgr || lnk->state == SMC_LNK_UNUSED)
> +	if (lnk->clearing || !lnk->lgr ||
> +	    lnk->state == SMC_LNK_UNUSED)
>   		return;
> +	lnk->clearing = 1;
>   	lnk->peer_qpn = 0;
>   	smc_llc_link_clear(lnk, log);
>   	smcr_buf_unmap_lgr(lnk);
> @@ -1224,15 +1243,23 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
>   	smc_wr_free_link(lnk);
>   	smc_ib_destroy_queue_pair(lnk);
>   	smc_ib_dealloc_protection_domain(lnk);
> -	smc_wr_free_link_mem(lnk);
> -	smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */
>   	smc_ibdev_cnt_dec(lnk);
>   	put_device(&lnk->smcibdev->ibdev->dev);
>   	smcibdev = lnk->smcibdev;
> -	memset(lnk, 0, sizeof(struct smc_link));
> -	lnk->state = SMC_LNK_UNUSED;
>   	if (!atomic_dec_return(&smcibdev->lnk_cnt))
>   		wake_up(&smcibdev->lnks_deleted);
> +	smcr_link_put(lnk); /* theoretically last link_put */
> +}
> +
> +void smcr_link_hold(struct smc_link *lnk)
> +{
> +	refcount_inc(&lnk->refcnt);
> +}
> +
> +void smcr_link_put(struct smc_link *lnk)
> +{
> +	if (refcount_dec_and_test(&lnk->refcnt))
> +		__smcr_link_clear(lnk);
>   }
>   
>   static void smcr_buf_free(struct smc_link_group *lgr, bool is_rmb,
> diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
> index 51203b1..e73217f 100644
> --- a/net/smc/smc_core.h
> +++ b/net/smc/smc_core.h
> @@ -137,6 +137,8 @@ struct smc_link {
>   	u8			peer_link_uid[SMC_LGR_ID_SIZE]; /* peer uid */
>   	u8			link_idx;	/* index in lgr link array */
>   	u8			link_is_asym;	/* is link asymmetric? */
> +	u8			clearing : 1;	/* link is being cleared */
> +	refcount_t		refcnt;		/* link reference count */
>   	struct smc_link_group	*lgr;		/* parent link group */
>   	struct work_struct	link_down_wrk;	/* wrk to bring link down */
>   	char			ibname[IB_DEVICE_NAME_MAX]; /* ib device name */
> @@ -504,6 +506,8 @@ void smc_rtoken_set2(struct smc_link_group *lgr, int rtok_idx, int link_id,
>   int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
>   		   u8 link_idx, struct smc_init_info *ini);
>   void smcr_link_clear(struct smc_link *lnk, bool log);
> +void smcr_link_hold(struct smc_link *lnk);
> +void smcr_link_put(struct smc_link *lnk);
>   void smc_switch_link_and_count(struct smc_connection *conn,
>   			       struct smc_link *to_lnk);
>   int smcr_buf_map_lgr(struct smc_link *lnk);

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-28 14:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-28  9:20 [RFC PATCH net 0/2] net/smc: Fix for race in smc link group termination Wen Gu
2021-12-28  9:20 ` [RFC PATCH net 1/2] net/smc: Resolve the race between link group access and termination Wen Gu
2021-12-28  9:20 ` [RFC PATCH net 2/2] net/smc: Resolve the race between SMC-R link access and clear Wen Gu
2021-12-28 14:50   ` Wen Gu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).