All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* cassini: possible recursive locking detected
@ 2014-05-06  9:39 Meelis Roos
  2014-05-08 12:53 ` Emil Goode
  0 siblings, 1 reply; 9+ messages in thread
From: Meelis Roos @ 2014-05-06  9:39 UTC (permalink / raw
  To: netdev

While installing Linux on Sun Fire V480, any traffic on builtin cassini 
NIC caused a hang. Worked this around by using Broadcom NIC and tried a 
kernel with most debugging options. This resulted in the following 
warning. Maybe this is the deadlonck I was seeing?

[   88.316595] =============================================
[   88.316597] [ INFO: possible recursive locking detected ]
[   88.316603] 3.15.0-rc4-00202-g30321c7-dirty #11 Not tainted
[   88.316605] ---------------------------------------------
[   88.316608] swapper/3/1 is trying to acquire lock:
[   88.316644]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
[   88.316646]
[   88.316646] but task is already holding lock:
[   88.316657]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
[   88.316659]
[   88.316659] other info that might help us debug this:
[   88.316661]  Possible unsafe locking scenario:
[   88.316661]
[   88.316662]        CPU0
[   88.316664]        ----
[   88.316668]   lock(&(&cp->tx_lock[i])->rlock);
[   88.316671]   lock(&(&cp->tx_lock[i])->rlock);
[   88.316672]
[   88.316672]  *** DEADLOCK ***
[   88.316672]
[   88.316674]  May be due to missing lock nesting notation
[   88.316674]
[   88.316677] 3 locks held by swapper/3/1:
[   88.316694]  #0:  ((&cp->link_timer)){+.-...}, at: [<0000000000465f80>] call_timer_fn+0x0/0xe0
[   88.316706]  #1:  (&(&cp->lock)->rlock){..-...}, at: [<0000000000745d80>] cas_link_timer+0x80/0x460
[   88.316716]  #2:  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
[   88.316718]
[   88.316718] stack backtrace:
[   88.316724] CPU: 2 PID: 1 Comm: swapper/3 Not tainted 3.15.0-rc4-00202-g30321c7-dirty #11
[   88.316727] Call Trace:
[   88.316743]  [00000000004a2c5c] __lock_acquire+0x10fc/0x1fa0
[   88.316749]  [00000000004a406c] lock_acquire+0x4c/0x80
[   88.316760]  [000000000083e07c] _raw_spin_lock+0x1c/0x40
[   88.316765]  [0000000000745da0] cas_link_timer+0xa0/0x460
[   88.316769]  [0000000000465fc8] call_timer_fn+0x48/0xe0
[   88.316775]  [00000000004665d4] run_timer_softirq+0x214/0x280
[   88.316788]  [000000000045f650] __do_softirq+0xf0/0x240
[   88.316800]  [000000000042bd0c] do_softirq_own_stack+0x2c/0x40
[   88.316804]  [000000000045fb44] irq_exit+0xc4/0xe0
[   88.316814]  [000000000042fcc8] timer_interrupt+0x88/0xc0
[   88.316819]  [0000000000426b84] valid_addr_bitmap_patch+0xbc/0x238
[   88.316826]  [00000000004ab2f8] vprintk_emit+0x1d8/0x540
[   88.316842]  [0000000000835fb8] printk+0x34/0x48
[   88.316847]  [00000000004ac3e0] register_console+0x340/0x3e0
[   88.316862]  [0000000000a74f2c] init_netconsole+0x180/0x20c
[   88.316867]  [0000000000426eb0] do_one_initcall+0x110/0x1a0

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cassini: possible recursive locking detected
  2014-05-06  9:39 cassini: possible recursive locking detected Meelis Roos
@ 2014-05-08 12:53 ` Emil Goode
  2014-05-08 20:00   ` Meelis Roos
  0 siblings, 1 reply; 9+ messages in thread
From: Emil Goode @ 2014-05-08 12:53 UTC (permalink / raw
  To: Meelis Roos; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 3463 bytes --]

Hello Meelis,

I think this warning happens because we acquire multiple locks
in a loop in cas_lock_tx() and I believe we should use nested
lock annotation here.

Perhaps you would like to try the attached patch?

It won't fix the deadlock that you mentioned though.

Best regards,

Emil Goode

On Tue, May 06, 2014 at 12:39:48PM +0300, Meelis Roos wrote:
> While installing Linux on Sun Fire V480, any traffic on builtin cassini 
> NIC caused a hang. Worked this around by using Broadcom NIC and tried a 
> kernel with most debugging options. This resulted in the following 
> warning. Maybe this is the deadlonck I was seeing?
> 
> [   88.316595] =============================================
> [   88.316597] [ INFO: possible recursive locking detected ]
> [   88.316603] 3.15.0-rc4-00202-g30321c7-dirty #11 Not tainted
> [   88.316605] ---------------------------------------------
> [   88.316608] swapper/3/1 is trying to acquire lock:
> [   88.316644]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> [   88.316646]
> [   88.316646] but task is already holding lock:
> [   88.316657]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> [   88.316659]
> [   88.316659] other info that might help us debug this:
> [   88.316661]  Possible unsafe locking scenario:
> [   88.316661]
> [   88.316662]        CPU0
> [   88.316664]        ----
> [   88.316668]   lock(&(&cp->tx_lock[i])->rlock);
> [   88.316671]   lock(&(&cp->tx_lock[i])->rlock);
> [   88.316672]
> [   88.316672]  *** DEADLOCK ***
> [   88.316672]
> [   88.316674]  May be due to missing lock nesting notation
> [   88.316674]
> [   88.316677] 3 locks held by swapper/3/1:
> [   88.316694]  #0:  ((&cp->link_timer)){+.-...}, at: [<0000000000465f80>] call_timer_fn+0x0/0xe0
> [   88.316706]  #1:  (&(&cp->lock)->rlock){..-...}, at: [<0000000000745d80>] cas_link_timer+0x80/0x460
> [   88.316716]  #2:  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> [   88.316718]
> [   88.316718] stack backtrace:
> [   88.316724] CPU: 2 PID: 1 Comm: swapper/3 Not tainted 3.15.0-rc4-00202-g30321c7-dirty #11
> [   88.316727] Call Trace:
> [   88.316743]  [00000000004a2c5c] __lock_acquire+0x10fc/0x1fa0
> [   88.316749]  [00000000004a406c] lock_acquire+0x4c/0x80
> [   88.316760]  [000000000083e07c] _raw_spin_lock+0x1c/0x40
> [   88.316765]  [0000000000745da0] cas_link_timer+0xa0/0x460
> [   88.316769]  [0000000000465fc8] call_timer_fn+0x48/0xe0
> [   88.316775]  [00000000004665d4] run_timer_softirq+0x214/0x280
> [   88.316788]  [000000000045f650] __do_softirq+0xf0/0x240
> [   88.316800]  [000000000042bd0c] do_softirq_own_stack+0x2c/0x40
> [   88.316804]  [000000000045fb44] irq_exit+0xc4/0xe0
> [   88.316814]  [000000000042fcc8] timer_interrupt+0x88/0xc0
> [   88.316819]  [0000000000426b84] valid_addr_bitmap_patch+0xbc/0x238
> [   88.316826]  [00000000004ab2f8] vprintk_emit+0x1d8/0x540
> [   88.316842]  [0000000000835fb8] printk+0x34/0x48
> [   88.316847]  [00000000004ac3e0] register_console+0x340/0x3e0
> [   88.316862]  [0000000000a74f2c] init_netconsole+0x180/0x20c
> [   88.316867]  [0000000000426eb0] do_one_initcall+0x110/0x1a0
> 
> -- 
> Meelis Roos (mroos@linux.ee)
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: 0001-net-cassini-use-nested-lock-annotation.patch --]
[-- Type: text/x-diff, Size: 936 bytes --]

>From 1f3fcb0cb141e167c5389861eb0a6cb935f6a3d5 Mon Sep 17 00:00:00 2001
From: Emil Goode <emilgoode@gmail.com>
Date: Thu, 8 May 2014 12:49:24 +0200
Subject: [PATCH] net: cassini: use nested lock annotation

In the cas_lock_tx function we acquire multiple locks in a loop and
need to use nested lock annotation to prevent lockdep warnings.

Signed-off-by: Emil Goode <emilgoode@gmail.com>
---
 drivers/net/ethernet/sun/cassini.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sun/cassini.c b/drivers/net/ethernet/sun/cassini.c
index df8d383..b9ac20f 100644
--- a/drivers/net/ethernet/sun/cassini.c
+++ b/drivers/net/ethernet/sun/cassini.c
@@ -246,7 +246,7 @@ static inline void cas_lock_tx(struct cas *cp)
 	int i;
 
 	for (i = 0; i < N_TX_RINGS; i++)
-		spin_lock(&cp->tx_lock[i]);
+		spin_lock_nested(&cp->tx_lock[i], i);
 }
 
 static inline void cas_lock_all(struct cas *cp)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: cassini: possible recursive locking detected
  2014-05-08 12:53 ` Emil Goode
@ 2014-05-08 20:00   ` Meelis Roos
  2014-05-08 22:38     ` Emil Goode
  0 siblings, 1 reply; 9+ messages in thread
From: Meelis Roos @ 2014-05-08 20:00 UTC (permalink / raw
  To: Emil Goode; +Cc: netdev

> Hello Meelis,
> 
> I think this warning happens because we acquire multiple locks
> in a loop in cas_lock_tx() and I believe we should use nested
> lock annotation here.
> 
> Perhaps you would like to try the attached patch?

Yes, it silences the warning.

> It won't fix the deadlock that you mentioned though.

Yes, the hang still happens, following a
ERROR: System Hardware FATAL RESET from  CPU0 CPU2

> 
> Best regards,
> 
> Emil Goode
> 
> On Tue, May 06, 2014 at 12:39:48PM +0300, Meelis Roos wrote:
> > While installing Linux on Sun Fire V480, any traffic on builtin cassini 
> > NIC caused a hang. Worked this around by using Broadcom NIC and tried a 
> > kernel with most debugging options. This resulted in the following 
> > warning. Maybe this is the deadlonck I was seeing?
> > 
> > [   88.316595] =============================================
> > [   88.316597] [ INFO: possible recursive locking detected ]
> > [   88.316603] 3.15.0-rc4-00202-g30321c7-dirty #11 Not tainted
> > [   88.316605] ---------------------------------------------
> > [   88.316608] swapper/3/1 is trying to acquire lock:
> > [   88.316644]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > [   88.316646]
> > [   88.316646] but task is already holding lock:
> > [   88.316657]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > [   88.316659]
> > [   88.316659] other info that might help us debug this:
> > [   88.316661]  Possible unsafe locking scenario:
> > [   88.316661]
> > [   88.316662]        CPU0
> > [   88.316664]        ----
> > [   88.316668]   lock(&(&cp->tx_lock[i])->rlock);
> > [   88.316671]   lock(&(&cp->tx_lock[i])->rlock);
> > [   88.316672]
> > [   88.316672]  *** DEADLOCK ***
> > [   88.316672]
> > [   88.316674]  May be due to missing lock nesting notation
> > [   88.316674]
> > [   88.316677] 3 locks held by swapper/3/1:
> > [   88.316694]  #0:  ((&cp->link_timer)){+.-...}, at: [<0000000000465f80>] call_timer_fn+0x0/0xe0
> > [   88.316706]  #1:  (&(&cp->lock)->rlock){..-...}, at: [<0000000000745d80>] cas_link_timer+0x80/0x460
> > [   88.316716]  #2:  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > [   88.316718]
> > [   88.316718] stack backtrace:
> > [   88.316724] CPU: 2 PID: 1 Comm: swapper/3 Not tainted 3.15.0-rc4-00202-g30321c7-dirty #11
> > [   88.316727] Call Trace:
> > [   88.316743]  [00000000004a2c5c] __lock_acquire+0x10fc/0x1fa0
> > [   88.316749]  [00000000004a406c] lock_acquire+0x4c/0x80
> > [   88.316760]  [000000000083e07c] _raw_spin_lock+0x1c/0x40
> > [   88.316765]  [0000000000745da0] cas_link_timer+0xa0/0x460
> > [   88.316769]  [0000000000465fc8] call_timer_fn+0x48/0xe0
> > [   88.316775]  [00000000004665d4] run_timer_softirq+0x214/0x280
> > [   88.316788]  [000000000045f650] __do_softirq+0xf0/0x240
> > [   88.316800]  [000000000042bd0c] do_softirq_own_stack+0x2c/0x40
> > [   88.316804]  [000000000045fb44] irq_exit+0xc4/0xe0
> > [   88.316814]  [000000000042fcc8] timer_interrupt+0x88/0xc0
> > [   88.316819]  [0000000000426b84] valid_addr_bitmap_patch+0xbc/0x238
> > [   88.316826]  [00000000004ab2f8] vprintk_emit+0x1d8/0x540
> > [   88.316842]  [0000000000835fb8] printk+0x34/0x48
> > [   88.316847]  [00000000004ac3e0] register_console+0x340/0x3e0
> > [   88.316862]  [0000000000a74f2c] init_netconsole+0x180/0x20c
> > [   88.316867]  [0000000000426eb0] do_one_initcall+0x110/0x1a0
> > 
> > -- 
> > Meelis Roos (mroos@linux.ee)
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cassini: possible recursive locking detected
  2014-05-08 20:00   ` Meelis Roos
@ 2014-05-08 22:38     ` Emil Goode
  2014-05-09  5:37       ` Meelis Roos
  0 siblings, 1 reply; 9+ messages in thread
From: Emil Goode @ 2014-05-08 22:38 UTC (permalink / raw
  To: Meelis Roos; +Cc: netdev

Hello,

On Thu, May 08, 2014 at 11:00:03PM +0300, Meelis Roos wrote:
> > Hello Meelis,
> > 
> > I think this warning happens because we acquire multiple locks
> > in a loop in cas_lock_tx() and I believe we should use nested
> > lock annotation here.
> > 
> > Perhaps you would like to try the attached patch?
> 
> Yes, it silences the warning.
> 

Ok thanks for testing, I'll send that patch.

> > It won't fix the deadlock that you mentioned though.
> 
> Yes, the hang still happens, following a
> ERROR: System Hardware FATAL RESET from  CPU0 CPU2
> 

Are you able to get the full dmesg output?
I think this could be hard to solve since I don't have
the hardware, but could take a look.

> > 
> > Best regards,
> > 
> > Emil Goode
> > 
> > On Tue, May 06, 2014 at 12:39:48PM +0300, Meelis Roos wrote:
> > > While installing Linux on Sun Fire V480, any traffic on builtin cassini 
> > > NIC caused a hang. Worked this around by using Broadcom NIC and tried a 
> > > kernel with most debugging options. This resulted in the following 
> > > warning. Maybe this is the deadlonck I was seeing?
> > > 
> > > [   88.316595] =============================================
> > > [   88.316597] [ INFO: possible recursive locking detected ]
> > > [   88.316603] 3.15.0-rc4-00202-g30321c7-dirty #11 Not tainted
> > > [   88.316605] ---------------------------------------------
> > > [   88.316608] swapper/3/1 is trying to acquire lock:
> > > [   88.316644]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > > [   88.316646]
> > > [   88.316646] but task is already holding lock:
> > > [   88.316657]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > > [   88.316659]
> > > [   88.316659] other info that might help us debug this:
> > > [   88.316661]  Possible unsafe locking scenario:
> > > [   88.316661]
> > > [   88.316662]        CPU0
> > > [   88.316664]        ----
> > > [   88.316668]   lock(&(&cp->tx_lock[i])->rlock);
> > > [   88.316671]   lock(&(&cp->tx_lock[i])->rlock);
> > > [   88.316672]
> > > [   88.316672]  *** DEADLOCK ***
> > > [   88.316672]
> > > [   88.316674]  May be due to missing lock nesting notation
> > > [   88.316674]
> > > [   88.316677] 3 locks held by swapper/3/1:
> > > [   88.316694]  #0:  ((&cp->link_timer)){+.-...}, at: [<0000000000465f80>] call_timer_fn+0x0/0xe0
> > > [   88.316706]  #1:  (&(&cp->lock)->rlock){..-...}, at: [<0000000000745d80>] cas_link_timer+0x80/0x460
> > > [   88.316716]  #2:  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > > [   88.316718]
> > > [   88.316718] stack backtrace:
> > > [   88.316724] CPU: 2 PID: 1 Comm: swapper/3 Not tainted 3.15.0-rc4-00202-g30321c7-dirty #11
> > > [   88.316727] Call Trace:
> > > [   88.316743]  [00000000004a2c5c] __lock_acquire+0x10fc/0x1fa0
> > > [   88.316749]  [00000000004a406c] lock_acquire+0x4c/0x80
> > > [   88.316760]  [000000000083e07c] _raw_spin_lock+0x1c/0x40
> > > [   88.316765]  [0000000000745da0] cas_link_timer+0xa0/0x460
> > > [   88.316769]  [0000000000465fc8] call_timer_fn+0x48/0xe0
> > > [   88.316775]  [00000000004665d4] run_timer_softirq+0x214/0x280
> > > [   88.316788]  [000000000045f650] __do_softirq+0xf0/0x240
> > > [   88.316800]  [000000000042bd0c] do_softirq_own_stack+0x2c/0x40
> > > [   88.316804]  [000000000045fb44] irq_exit+0xc4/0xe0
> > > [   88.316814]  [000000000042fcc8] timer_interrupt+0x88/0xc0
> > > [   88.316819]  [0000000000426b84] valid_addr_bitmap_patch+0xbc/0x238
> > > [   88.316826]  [00000000004ab2f8] vprintk_emit+0x1d8/0x540
> > > [   88.316842]  [0000000000835fb8] printk+0x34/0x48
> > > [   88.316847]  [00000000004ac3e0] register_console+0x340/0x3e0
> > > [   88.316862]  [0000000000a74f2c] init_netconsole+0x180/0x20c
> > > [   88.316867]  [0000000000426eb0] do_one_initcall+0x110/0x1a0
> > > 
> > > -- 
> > > Meelis Roos (mroos@linux.ee)
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> -- 
> Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cassini: possible recursive locking detected
  2014-05-08 22:38     ` Emil Goode
@ 2014-05-09  5:37       ` Meelis Roos
  2014-05-09  9:06         ` Emil Goode
  0 siblings, 1 reply; 9+ messages in thread
From: Meelis Roos @ 2014-05-09  5:37 UTC (permalink / raw
  To: Emil Goode; +Cc: netdev

> > > It won't fix the deadlock that you mentioned though.
> > 
> > Yes, the hang still happens, following a
> > ERROR: System Hardware FATAL RESET from  CPU0 CPU2
> > 
> 
> Are you able to get the full dmesg output?
> I think this could be hard to solve since I don't have
> the hardware, but could take a look.

There is sparc64 firmware-specific FATAL RESET only, inclugind pages of 
state dump for all 4 CPU-s and their MMUs etc. Nothing from Linux side.

So it's like recursive fault or something similar.

> 
> > > 
> > > Best regards,
> > > 
> > > Emil Goode
> > > 
> > > On Tue, May 06, 2014 at 12:39:48PM +0300, Meelis Roos wrote:
> > > > While installing Linux on Sun Fire V480, any traffic on builtin cassini 
> > > > NIC caused a hang. Worked this around by using Broadcom NIC and tried a 
> > > > kernel with most debugging options. This resulted in the following 
> > > > warning. Maybe this is the deadlonck I was seeing?
> > > > 
> > > > [   88.316595] =============================================
> > > > [   88.316597] [ INFO: possible recursive locking detected ]
> > > > [   88.316603] 3.15.0-rc4-00202-g30321c7-dirty #11 Not tainted
> > > > [   88.316605] ---------------------------------------------
> > > > [   88.316608] swapper/3/1 is trying to acquire lock:
> > > > [   88.316644]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > > > [   88.316646]
> > > > [   88.316646] but task is already holding lock:
> > > > [   88.316657]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > > > [   88.316659]
> > > > [   88.316659] other info that might help us debug this:
> > > > [   88.316661]  Possible unsafe locking scenario:
> > > > [   88.316661]
> > > > [   88.316662]        CPU0
> > > > [   88.316664]        ----
> > > > [   88.316668]   lock(&(&cp->tx_lock[i])->rlock);
> > > > [   88.316671]   lock(&(&cp->tx_lock[i])->rlock);
> > > > [   88.316672]
> > > > [   88.316672]  *** DEADLOCK ***
> > > > [   88.316672]
> > > > [   88.316674]  May be due to missing lock nesting notation
> > > > [   88.316674]
> > > > [   88.316677] 3 locks held by swapper/3/1:
> > > > [   88.316694]  #0:  ((&cp->link_timer)){+.-...}, at: [<0000000000465f80>] call_timer_fn+0x0/0xe0
> > > > [   88.316706]  #1:  (&(&cp->lock)->rlock){..-...}, at: [<0000000000745d80>] cas_link_timer+0x80/0x460
> > > > [   88.316716]  #2:  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > > > [   88.316718]
> > > > [   88.316718] stack backtrace:
> > > > [   88.316724] CPU: 2 PID: 1 Comm: swapper/3 Not tainted 3.15.0-rc4-00202-g30321c7-dirty #11
> > > > [   88.316727] Call Trace:
> > > > [   88.316743]  [00000000004a2c5c] __lock_acquire+0x10fc/0x1fa0
> > > > [   88.316749]  [00000000004a406c] lock_acquire+0x4c/0x80
> > > > [   88.316760]  [000000000083e07c] _raw_spin_lock+0x1c/0x40
> > > > [   88.316765]  [0000000000745da0] cas_link_timer+0xa0/0x460
> > > > [   88.316769]  [0000000000465fc8] call_timer_fn+0x48/0xe0
> > > > [   88.316775]  [00000000004665d4] run_timer_softirq+0x214/0x280
> > > > [   88.316788]  [000000000045f650] __do_softirq+0xf0/0x240
> > > > [   88.316800]  [000000000042bd0c] do_softirq_own_stack+0x2c/0x40
> > > > [   88.316804]  [000000000045fb44] irq_exit+0xc4/0xe0
> > > > [   88.316814]  [000000000042fcc8] timer_interrupt+0x88/0xc0
> > > > [   88.316819]  [0000000000426b84] valid_addr_bitmap_patch+0xbc/0x238
> > > > [   88.316826]  [00000000004ab2f8] vprintk_emit+0x1d8/0x540
> > > > [   88.316842]  [0000000000835fb8] printk+0x34/0x48
> > > > [   88.316847]  [00000000004ac3e0] register_console+0x340/0x3e0
> > > > [   88.316862]  [0000000000a74f2c] init_netconsole+0x180/0x20c
> > > > [   88.316867]  [0000000000426eb0] do_one_initcall+0x110/0x1a0
> > > > 
> > > > -- 
> > > > Meelis Roos (mroos@linux.ee)
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > -- 
> > Meelis Roos (mroos@linux.ee)
> 

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cassini: possible recursive locking detected
  2014-05-09  5:37       ` Meelis Roos
@ 2014-05-09  9:06         ` Emil Goode
  2014-05-09 20:33           ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Emil Goode @ 2014-05-09  9:06 UTC (permalink / raw
  To: Meelis Roos; +Cc: netdev

On Fri, May 09, 2014 at 08:37:30AM +0300, Meelis Roos wrote:
> > > > It won't fix the deadlock that you mentioned though.
> > > 
> > > Yes, the hang still happens, following a
> > > ERROR: System Hardware FATAL RESET from  CPU0 CPU2
> > > 
> > 
> > Are you able to get the full dmesg output?
> > I think this could be hard to solve since I don't have
> > the hardware, but could take a look.
> 
> There is sparc64 firmware-specific FATAL RESET only, inclugind pages of 
> state dump for all 4 CPU-s and their MMUs etc. Nothing from Linux side.
> 
> So it's like recursive fault or something similar.
> 

I searched the net a bit and found these old threads:

http://lists.freebsd.org/pipermail/freebsd-sparc64/2010-January/006935.html

"FreeBSD currently crashes on older models of V480 when attempting
 to use an on-board NIC due to what appears to be a CPU bug which
 needs to be worked around."

http://marc.info/?l=linux-sparc&m=122220796209509&w=2

"I noticed the cassini network driver for the builtin gigabit
 network is unstable and brings the kernel down on a dualprocessor sparc
 SunFire 480R with a Hardware FATAL RESET"

I think you should ask about this on the sparclinux mailing list.

http://vger.kernel.org/vger-lists.html#sparclinux

I would say it's very unlikely that the problem is related to that lockdep warning.

> > 
> > > > 
> > > > Best regards,
> > > > 
> > > > Emil Goode
> > > > 
> > > > On Tue, May 06, 2014 at 12:39:48PM +0300, Meelis Roos wrote:
> > > > > While installing Linux on Sun Fire V480, any traffic on builtin cassini 
> > > > > NIC caused a hang. Worked this around by using Broadcom NIC and tried a 
> > > > > kernel with most debugging options. This resulted in the following 
> > > > > warning. Maybe this is the deadlonck I was seeing?
> > > > > 
> > > > > [   88.316595] =============================================
> > > > > [   88.316597] [ INFO: possible recursive locking detected ]
> > > > > [   88.316603] 3.15.0-rc4-00202-g30321c7-dirty #11 Not tainted
> > > > > [   88.316605] ---------------------------------------------
> > > > > [   88.316608] swapper/3/1 is trying to acquire lock:
> > > > > [   88.316644]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > > > > [   88.316646]
> > > > > [   88.316646] but task is already holding lock:
> > > > > [   88.316657]  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > > > > [   88.316659]
> > > > > [   88.316659] other info that might help us debug this:
> > > > > [   88.316661]  Possible unsafe locking scenario:
> > > > > [   88.316661]
> > > > > [   88.316662]        CPU0
> > > > > [   88.316664]        ----
> > > > > [   88.316668]   lock(&(&cp->tx_lock[i])->rlock);
> > > > > [   88.316671]   lock(&(&cp->tx_lock[i])->rlock);
> > > > > [   88.316672]
> > > > > [   88.316672]  *** DEADLOCK ***
> > > > > [   88.316672]
> > > > > [   88.316674]  May be due to missing lock nesting notation
> > > > > [   88.316674]
> > > > > [   88.316677] 3 locks held by swapper/3/1:
> > > > > [   88.316694]  #0:  ((&cp->link_timer)){+.-...}, at: [<0000000000465f80>] call_timer_fn+0x0/0xe0
> > > > > [   88.316706]  #1:  (&(&cp->lock)->rlock){..-...}, at: [<0000000000745d80>] cas_link_timer+0x80/0x460
> > > > > [   88.316716]  #2:  (&(&cp->tx_lock[i])->rlock){..-...}, at: [<0000000000745da0>] cas_link_timer+0xa0/0x460
> > > > > [   88.316718]
> > > > > [   88.316718] stack backtrace:
> > > > > [   88.316724] CPU: 2 PID: 1 Comm: swapper/3 Not tainted 3.15.0-rc4-00202-g30321c7-dirty #11
> > > > > [   88.316727] Call Trace:
> > > > > [   88.316743]  [00000000004a2c5c] __lock_acquire+0x10fc/0x1fa0
> > > > > [   88.316749]  [00000000004a406c] lock_acquire+0x4c/0x80
> > > > > [   88.316760]  [000000000083e07c] _raw_spin_lock+0x1c/0x40
> > > > > [   88.316765]  [0000000000745da0] cas_link_timer+0xa0/0x460
> > > > > [   88.316769]  [0000000000465fc8] call_timer_fn+0x48/0xe0
> > > > > [   88.316775]  [00000000004665d4] run_timer_softirq+0x214/0x280
> > > > > [   88.316788]  [000000000045f650] __do_softirq+0xf0/0x240
> > > > > [   88.316800]  [000000000042bd0c] do_softirq_own_stack+0x2c/0x40
> > > > > [   88.316804]  [000000000045fb44] irq_exit+0xc4/0xe0
> > > > > [   88.316814]  [000000000042fcc8] timer_interrupt+0x88/0xc0
> > > > > [   88.316819]  [0000000000426b84] valid_addr_bitmap_patch+0xbc/0x238
> > > > > [   88.316826]  [00000000004ab2f8] vprintk_emit+0x1d8/0x540
> > > > > [   88.316842]  [0000000000835fb8] printk+0x34/0x48
> > > > > [   88.316847]  [00000000004ac3e0] register_console+0x340/0x3e0
> > > > > [   88.316862]  [0000000000a74f2c] init_netconsole+0x180/0x20c
> > > > > [   88.316867]  [0000000000426eb0] do_one_initcall+0x110/0x1a0
> > > > > 
> > > > > -- 
> > > > > Meelis Roos (mroos@linux.ee)
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > 
> > > -- 
> > > Meelis Roos (mroos@linux.ee)
> > 
> 
> -- 
> Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cassini: possible recursive locking detected
  2014-05-09  9:06         ` Emil Goode
@ 2014-05-09 20:33           ` David Miller
  2014-05-16 20:12             ` Bjørn Mork
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2014-05-09 20:33 UTC (permalink / raw
  To: emilgoode; +Cc: mroos, netdev

From: Emil Goode <emilgoode@gmail.com>
Date: Fri, 9 May 2014 11:06:42 +0200

> I searched the net a bit and found these old threads:
> 
> http://lists.freebsd.org/pipermail/freebsd-sparc64/2010-January/006935.html
> 
> "FreeBSD currently crashes on older models of V480 when attempting
>  to use an on-board NIC due to what appears to be a CPU bug which
>  needs to be worked around."

I wish I still had a copy of that Schizo chip errata document referenced
at the end of that posting, it's not accessible any longer.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cassini: possible recursive locking detected
  2014-05-09 20:33           ` David Miller
@ 2014-05-16 20:12             ` Bjørn Mork
  2014-05-16 20:48               ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Bjørn Mork @ 2014-05-16 20:12 UTC (permalink / raw
  To: David Miller; +Cc: emilgoode, mroos, netdev

David Miller <davem@davemloft.net> writes:
> From: Emil Goode <emilgoode@gmail.com>
> Date: Fri, 9 May 2014 11:06:42 +0200
>
>> I searched the net a bit and found these old threads:
>> 
>> http://lists.freebsd.org/pipermail/freebsd-sparc64/2010-January/006935.html
>> 
>> "FreeBSD currently crashes on older models of V480 when attempting
>>  to use an on-board NIC due to what appears to be a CPU bug which
>>  needs to be worked around."
>
> I wish I still had a copy of that Schizo chip errata document referenced
> at the end of that posting, it's not accessible any longer.

The wayback machine saved it for you:
https://web.archive.org/web/20090701005954/http://www.sun.com/processors/manuals/External_Schizo_Errata.pdf



Bjørn

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cassini: possible recursive locking detected
  2014-05-16 20:12             ` Bjørn Mork
@ 2014-05-16 20:48               ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2014-05-16 20:48 UTC (permalink / raw
  To: bjorn; +Cc: emilgoode, mroos, netdev

From: Bjørn Mork <bjorn@mork.no>
Date: Fri, 16 May 2014 22:12:15 +0200

> David Miller <davem@davemloft.net> writes:
>> From: Emil Goode <emilgoode@gmail.com>
>> Date: Fri, 9 May 2014 11:06:42 +0200
>>
>>> I searched the net a bit and found these old threads:
>>> 
>>> http://lists.freebsd.org/pipermail/freebsd-sparc64/2010-January/006935.html
>>> 
>>> "FreeBSD currently crashes on older models of V480 when attempting
>>>  to use an on-board NIC due to what appears to be a CPU bug which
>>>  needs to be worked around."
>>
>> I wish I still had a copy of that Schizo chip errata document referenced
>> at the end of that posting, it's not accessible any longer.
> 
> The wayback machine saved it for you:
> https://web.archive.org/web/20090701005954/http://www.sun.com/processors/manuals/External_Schizo_Errata.pdf

Thanks, Meelis pointed out something similar to me in private correspondance :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-05-16 20:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-06  9:39 cassini: possible recursive locking detected Meelis Roos
2014-05-08 12:53 ` Emil Goode
2014-05-08 20:00   ` Meelis Roos
2014-05-08 22:38     ` Emil Goode
2014-05-09  5:37       ` Meelis Roos
2014-05-09  9:06         ` Emil Goode
2014-05-09 20:33           ` David Miller
2014-05-16 20:12             ` Bjørn Mork
2014-05-16 20:48               ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.