All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] Cluster blocked, so as to reboot all nodes to avoid it. Is there any patchs for it? Thanks.
@ 2014-08-21  1:07 Guozhonghua
  2014-08-21  1:59 ` Joseph Qi
  0 siblings, 1 reply; 3+ messages in thread
From: Guozhonghua @ 2014-08-21  1:07 UTC (permalink / raw
  To: ocfs2-devel

Hi, everyone

And we have the blocked cluster several times, and the log is always, we have to reboot all the node of the cluster to avoid it.
Is there any patch that had fix this bug?
[<ffffffff817539a5>] schedule_timeout+0x1e5/0x250
[<ffffffff81755a77>] wait_for_completion+0xa7/0x160
[<ffffffff8109c9b0>] ? try_to_wake_up+0x2c0/0x2c0
[<ffffffffa0564063>] __ocfs2_cluster_lock.isra.30+0x1f3/0x820 [ocfs2]


As we test with a lot of node in one cluster, may be ten or twenty nodes, the cluster is always blocked, and the log is below,
The kernel version is 3.13.6.


Aug 20 10:05:43 server211 kernel: [82025.281828]       Tainted: GF       W  O 3.13.6 #5
Aug 20 10:05:43 server211 kernel: [82025.281830] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 20 10:05:43 server211 kernel: [82025.281833] mount.ocfs2     D 0000000000000000     0 57890  57889 0x00000000
Aug 20 10:05:43 server211 kernel: [82025.281838]  ffff880427e03888 0000000000000002 ffff880427e03828 ffffffff8101cba3
Aug 20 10:05:43 server211 kernel: [82025.281842]  ffff8804270a1810 0000000000014440 ffff880427e03fd8 0000000000014440
Aug 20 10:05:43 server211 kernel: [82025.281845]  ffff88042958e040 ffff8804270a1810 ffff8804270a1810 ffff880427e03a60
Aug 20 10:05:43 server211 kernel: [82025.281849] Call Trace:
Aug 20 10:05:43 server211 kernel: [82025.281862]  [<ffffffff8101cba3>] ? native_sched_clock+0x13/0x80
Aug 20 10:05:43 server211 kernel: [82025.281867]  [<ffffffff817547d9>] schedule+0x29/0x70
Aug 20 10:05:43 server211 kernel: [82025.281870]  [<ffffffff817539a5>] schedule_timeout+0x1e5/0x250
Aug 20 10:05:43 server211 kernel: [82025.281874]  [<ffffffff81755a77>] wait_for_completion+0xa7/0x160
Aug 20 10:05:43 server211 kernel: [82025.281879]  [<ffffffff8109c9b0>] ? try_to_wake_up+0x2c0/0x2c0
Aug 20 10:05:43 server211 kernel: [82025.281907]  [<ffffffffa0564063>] __ocfs2_cluster_lock.isra.30+0x1f3/0x820 [ocfs2]
Aug 20 10:05:43 server211 kernel: [82025.281910]  [<ffffffff8175501c>] ? out_of_line_wait_on_bit+0x7c/0x90
Aug 20 10:05:43 server211 kernel: [82025.281922]  [<ffffffffa0562493>] ? ocfs2_inode_lock_res_init+0x73/0x160 [ocfs2]
Aug 20 10:05:43 server211 kernel: [82025.281934]  [<ffffffffa05658ca>] ocfs2_inode_lock_full_nested+0x13a/0xb80 [ocfs2]
Aug 20 10:05:43 server211 kernel: [82025.281958]  [<ffffffffa0576571>] ? ocfs2_iget+0x121/0x7d0 [ocfs2]
Aug 20 10:05:43 server211 kernel: [82025.281971]  [<ffffffffa057a9f2>] ocfs2_journal_init+0x92/0x480 [ocfs2]
Aug 20 10:05:43 server211 kernel: [82025.281986]  [<ffffffffa05bc3f1>] ocfs2_fill_super+0x15a1/0x25a0 [ocfs2]
Aug 20 10:05:43 server211 kernel: [82025.281992]  [<ffffffff81394e49>] ? vsnprintf+0x309/0x600
Aug 20 10:05:43 server211 kernel: [82025.281998]  [<ffffffff811c4c99>] mount_bdev+0x1b9/0x200
Aug 20 10:05:43 server211 kernel: [82025.282011]  [<ffffffffa05bae50>] ? ocfs2_initialize_super.isra.208+0x1470/0x1470 [ocfs2]
Aug 20 10:05:43 server211 kernel: [82025.282022]  [<ffffffffa05adbe5>] ocfs2_mount+0x15/0x20 [ocfs2]
Aug 20 10:05:43 server211 kernel: [82025.282025]  [<ffffffff811c58c3>] mount_fs+0x43/0x1b0
Aug 20 10:05:43 server211 kernel: [82025.282029]  [<ffffffff811e0ab6>] vfs_kern_mount+0x76/0x130
Aug 20 10:05:43 server211 kernel: [82025.282032]  [<ffffffff811e2d47>] do_mount+0x237/0xa90
Aug 20 10:05:43 server211 kernel: [82025.282037]  [<ffffffff8115800e>] ? __get_free_pages+0xe/0x40
Aug 20 10:05:43 server211 kernel: [82025.282040]  [<ffffffff811e297a>] ? copy_mount_options+0x3a/0x180
Aug 20 10:05:43 server211 kernel: [82025.282043]  [<ffffffff811e3920>] SyS_mount+0x90/0xe0
Aug 20 10:05:43 server211 kernel: [82025.282048]  [<ffffffff81760fbf>] tracesys+0xe1/0xe6
Aug 20 10:06:01 server211 CRON[803]: (root) CMD (   /opt/bin/tomcat_check.sh)



-------------------------------------------------------------------------------------------------------------------------------------
????????????????????????????????????????
????????????????????????????????????????
????????????????????????????????????????
???
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140821/e0cb6dc9/attachment.html 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Ocfs2-devel] Cluster blocked, so as to reboot all nodes to avoid it. Is there any patchs for it? Thanks.
  2014-08-21  1:07 [Ocfs2-devel] Cluster blocked, so as to reboot all nodes to avoid it. Is there any patchs for it? Thanks Guozhonghua
@ 2014-08-21  1:59 ` Joseph Qi
  2014-08-21  2:31   ` [Ocfs2-devel] 答复: " Guozhonghua
  0 siblings, 1 reply; 3+ messages in thread
From: Joseph Qi @ 2014-08-21  1:59 UTC (permalink / raw
  To: ocfs2-devel

From the stack, it seems that it blocks on loading journal during mount.
Has it already been owned by another node?
Try debugfs.ocfs2 'fs_locks -B' and 'dlm_locks xxx' to find out why.

On 2014/8/21 9:07, Guozhonghua wrote:
> Hi, everyone
> 
>  
> 
> And we have the blocked cluster several times, and the log is always, we have to reboot all the node of the cluster to avoid it.
> 
> Is there any patch that had fix this bug?  
> 
> [<ffffffff817539a5>] schedule_timeout+0x1e5/0x250
> 
> [<ffffffff81755a77>] wait_for_completion+0xa7/0x160
> 
> [<ffffffff8109c9b0>] ? try_to_wake_up+0x2c0/0x2c0
> 
> [<ffffffffa0564063>] __ocfs2_cluster_lock.isra.30+0x1f3/0x820 [ocfs2]
> 
>  
> 
>  
> 
> As we test with a lot of node in one cluster, may be ten or twenty nodes, the cluster is always blocked, and the log is below,
> 
> The kernel version is 3.13.6.
> 
>  
> 
>  
> 
> Aug 20 10:05:43 server211 kernel: [82025.281828]       Tainted: GF       W  O 3.13.6 #5
> 
> Aug 20 10:05:43 server211 kernel: [82025.281830] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 
> Aug 20 10:05:43 server211 kernel: [82025.281833] mount.ocfs2     D 0000000000000000     0 57890  57889 0x00000000
> 
> Aug 20 10:05:43 server211 kernel: [82025.281838]  ffff880427e03888 0000000000000002 ffff880427e03828 ffffffff8101cba3
> 
> Aug 20 10:05:43 server211 kernel: [82025.281842]  ffff8804270a1810 0000000000014440 ffff880427e03fd8 0000000000014440
> 
> Aug 20 10:05:43 server211 kernel: [82025.281845]  ffff88042958e040 ffff8804270a1810 ffff8804270a1810 ffff880427e03a60
> 
> Aug 20 10:05:43 server211 kernel: [82025.281849] Call Trace:
> 
> Aug 20 10:05:43 server211 kernel: [82025.281862]  [<ffffffff8101cba3>] ? native_sched_clock+0x13/0x80
> 
> Aug 20 10:05:43 server211 kernel: [82025.281867]  [<ffffffff817547d9>] schedule+0x29/0x70
> 
> Aug 20 10:05:43 server211 kernel: [82025.281870]  [<ffffffff817539a5>] schedule_timeout+0x1e5/0x250
> 
> Aug 20 10:05:43 server211 kernel: [82025.281874]  [<ffffffff81755a77>] wait_for_completion+0xa7/0x160
> 
> Aug 20 10:05:43 server211 kernel: [82025.281879]  [<ffffffff8109c9b0>] ? try_to_wake_up+0x2c0/0x2c0
> 
> Aug 20 10:05:43 server211 kernel: [82025.281907]  [<ffffffffa0564063>] __ocfs2_cluster_lock.isra.30+0x1f3/0x820 [ocfs2]
> 
> Aug 20 10:05:43 server211 kernel: [82025.281910]  [<ffffffff8175501c>] ? out_of_line_wait_on_bit+0x7c/0x90
> 
> Aug 20 10:05:43 server211 kernel: [82025.281922]  [<ffffffffa0562493>] ? ocfs2_inode_lock_res_init+0x73/0x160 [ocfs2]
> 
> Aug 20 10:05:43 server211 kernel: [82025.281934]  [<ffffffffa05658ca>] ocfs2_inode_lock_full_nested+0x13a/0xb80 [ocfs2]
> 
> Aug 20 10:05:43 server211 kernel: [82025.281958]  [<ffffffffa0576571>] ? ocfs2_iget+0x121/0x7d0 [ocfs2]
> 
> Aug 20 10:05:43 server211 kernel: [82025.281971]  [<ffffffffa057a9f2>] ocfs2_journal_init+0x92/0x480 [ocfs2]
> 
> Aug 20 10:05:43 server211 kernel: [82025.281986]  [<ffffffffa05bc3f1>] ocfs2_fill_super+0x15a1/0x25a0 [ocfs2]
> 
> Aug 20 10:05:43 server211 kernel: [82025.281992]  [<ffffffff81394e49>] ? vsnprintf+0x309/0x600
> 
> Aug 20 10:05:43 server211 kernel: [82025.281998]  [<ffffffff811c4c99>] mount_bdev+0x1b9/0x200
> 
> Aug 20 10:05:43 server211 kernel: [82025.282011]  [<ffffffffa05bae50>] ? ocfs2_initialize_super.isra.208+0x1470/0x1470 [ocfs2]
> 
> Aug 20 10:05:43 server211 kernel: [82025.282022]  [<ffffffffa05adbe5>] ocfs2_mount+0x15/0x20 [ocfs2]
> 
> Aug 20 10:05:43 server211 kernel: [82025.282025]  [<ffffffff811c58c3>] mount_fs+0x43/0x1b0
> 
> Aug 20 10:05:43 server211 kernel: [82025.282029]  [<ffffffff811e0ab6>] vfs_kern_mount+0x76/0x130
> 
> Aug 20 10:05:43 server211 kernel: [82025.282032]  [<ffffffff811e2d47>] do_mount+0x237/0xa90
> 
> Aug 20 10:05:43 server211 kernel: [82025.282037]  [<ffffffff8115800e>] ? __get_free_pages+0xe/0x40
> 
> Aug 20 10:05:43 server211 kernel: [82025.282040]  [<ffffffff811e297a>] ? copy_mount_options+0x3a/0x180
> 
> Aug 20 10:05:43 server211 kernel: [82025.282043]  [<ffffffff811e3920>] SyS_mount+0x90/0xe0
> 
> Aug 20 10:05:43 server211 kernel: [82025.282048]  [<ffffffff81760fbf>] tracesys+0xe1/0xe6
> 
> Aug 20 10:06:01 server211 CRON[803]: (root) CMD (   /opt/bin/tomcat_check.sh)
> 
>  
> 
>  
> 
>  
> 
> -------------------------------------------------------------------------------------------------------------------------------------
> ????????????????????????????????????????
> ????????????????????????????????????????
> ????????????????????????????????????????
> ???
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Ocfs2-devel] 答复:  Cluster blocked, so as to reboot all nodes to avoid it. Is there any patchs for it? Thanks.
  2014-08-21  1:59 ` Joseph Qi
@ 2014-08-21  2:31   ` Guozhonghua
  0 siblings, 0 replies; 3+ messages in thread
From: Guozhonghua @ 2014-08-21  2:31 UTC (permalink / raw
  To: ocfs2-devel

Thanks Joseph

But as I try it, there is some error:
debugfs: fs_locks -B
Debug string proto 3 found, but 2 is the highest I understand.

Ocfs2-tools 1.6.4.  Is the tools version older?


We have the blocked scenario several times for different code stack, and is there any way changing codes to avoid it?
I means return error code for blocked lock without wait the completion.

Thanks

-------------------------------------------------------------------------------------------------------------------------------------
????????????????????????????????????????
????????????????????????????????????????
????????????????????????????????????????
???
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-08-21  2:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-21  1:07 [Ocfs2-devel] Cluster blocked, so as to reboot all nodes to avoid it. Is there any patchs for it? Thanks Guozhonghua
2014-08-21  1:59 ` Joseph Qi
2014-08-21  2:31   ` [Ocfs2-devel] 答复: " Guozhonghua

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.