The problem ended up being incorrect order and colocation setup in the cib. Everything is stable now. <br><br>And yes, fencing is crucial! Everyone keeps saying that but I haven&#39;t ran into a situation where it was needed yet. I tend to do failover tests to test the fencing. <br>
<div class="gmail_extra"><br><br><div class="gmail_quote">2012/12/3 Lars Ellenberg <span dir="ltr">&lt;<a href="mailto:lars.ellenberg@linbit.com" target="_blank">lars.ellenberg@linbit.com</a>&gt;</span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Sun, Dec 02, 2012 at 11:52:47AM +0100, Stefan Midjich wrote:<br>
&gt; Fortunately the data volume was only mounted but not in use.<br>
&gt;<br>
&gt; I found a similar list post on<br>
&gt; <a href="http://lists.linbit.com/pipermail/drbd-user/2008-April/009156.html" target="_blank">http://lists.linbit.com/pipermail/drbd-user/2008-April/009156.html</a> but it<br>
&gt; had no replies on what could cause this. I&#39;ve been thinking the DRBD<br>
&gt; traffic should be on a separate network but have not set this up yet. Right<br>
&gt; now the DRBD traffic goes over the same vNetwork that other traffic goes<br>
&gt; over, including multicast VIP traffic form both LVS and pacemaker clusters.<br>
&gt;<br>
&gt; In words the SyncSource node started using a critical load average of<br>
&gt; resources and became unresponsive. This is a VM setup split over different<br>
&gt; physical ESX hosts but even the local console was dead. So a forced reset<br>
&gt; was in order.<br>
&gt;<br>
&gt; The cluster services came up fine, corosync+pacemaker+o2cb+ocfs2_dlm. The<br>
<br>
</div>With cluster file systems,<br>
you need tested and confirmed working fencing, aka STONITH.<br>
<br>
Fencing/STONITH is a hard requirement.<br>
This is not negotiable.<br>
<br>
If you try to get away without it,<br>
and the network layer has so much as a hickup,<br>
your IO will block.<br>
<br>
Hard.<br>
<br>
Up to here, this was not even considering DRBD...<br>
<br>
If you want to use cluster file systems on top of DRBD,<br>
you *additionally* need to integrate DRBD<br>
replication link breakage into your fencing setup.<br>
<br>
Some keywords to search for:<br>
fencing resource-and-stonith; fence-peer handler; obliterate-peer;<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
&gt; cluster is Debian Squeeze with corosync, pacemaker, openais and cman from<br>
&gt; backports. Only corosync and pacemaker are services actually used. Other<br>
&gt; packages are only installed for access to things like fencing and resource<br>
&gt; agents. Drbd 8.3.7 is used from Debian stable repository.<br>
&gt;<br>
&gt; The drbd config is mostly stock, here is the reource definition.<br>
&gt;<br>
&gt; resource shared0 {<br>
&gt;  meta-disk internal;<br>
&gt;  device  /dev/drbd1;<br>
&gt;  syncer {<br>
&gt;   verify-alg sha1;<br>
&gt;  }<br>
&gt;  net {<br>
&gt;   allow-two-primaries;<br>
&gt;  }<br>
&gt;  on appserver01 {<br>
&gt;   disk   /dev/mapper/shared0_appserver01-lv0;<br>
&gt;   address  <a href="http://10.221.182.31:7789" target="_blank">10.221.182.31:7789</a>;<br>
&gt;  }<br>
&gt;  on appserver02 {<br>
&gt;   disk   /dev/mapper/shared0_appserver02-lv0;<br>
&gt;   address  <a href="http://10.221.182.32:7789" target="_blank">10.221.182.32:7789</a>;<br>
&gt;  }<br>
&gt; }<br>
&gt;<br>
&gt; The logs on the SyncSource node show the following happening at the time of<br>
&gt; the failure.<br>
&gt;<br>
&gt; Dec  2 02:09:56 appserver01 kernel: [123911.353113] block drbd1: peer(<br>
&gt; Primary -&gt; Unknown ) conn( SyncSource -&gt; NetworkFailure )<br>
&gt; Dec  2 02:09:56 appserver01 kernel: [123911.353123] block drbd1: asender<br>
&gt; terminated<br>
&gt; Dec  2 02:09:56 appserver01 kernel: [123911.353126] block drbd1:<br>
&gt; Terminating drbd1_asender<br>
&gt; Dec  2 02:09:56 appserver01 kernel: [123911.353967] block drbd1: Connection<br>
&gt; closed<br>
&gt; Dec  2 02:09:56 appserver01 kernel: [123911.353974] block drbd1: conn(<br>
&gt; NetworkFailure -&gt; Unconnected )<br>
&gt; Dec  2 02:09:56 appserver01 kernel: [123911.353977] block drbd1: receiver<br>
&gt; terminated<br>
&gt; Dec  2 02:09:56 appserver01 kernel: [123911.353978] block drbd1: Restarting<br>
&gt; drbd1_receiver<br>
&gt; Dec  2 02:09:56 appserver01 kernel: [123911.353980] block drbd1: receiver<br>
&gt; (re)started<br>
&gt; Dec  2 02:09:56 appserver01 kernel: [123911.353983] block drbd1: conn(<br>
&gt; Unconnected -&gt; WFConnection )<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093326] ocfs2rec      D<br>
&gt; ffff88017e7fa350     0 26221      2 0x00000000<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093330]  ffff88017e7fa350<br>
&gt; 0000000000000046 ffff88018dad4000 0000000000000010<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093333]  0000000000000616<br>
&gt; ffffea000455c168 000000000000f9e0 ffff88018dad5fd8<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093335]  0000000000015780<br>
&gt; 0000000000015780 ffff88017e266350 ffff88017e266648<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093338] Call Trace:<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093346]  [&lt;ffffffff812fcc4f&gt;] ?<br>
&gt; rwsem_down_failed_common+0x8c/0xa8<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093348]  [&lt;ffffffff812fccb2&gt;] ?<br>
&gt; rwsem_down_read_failed+0x22/0x2b<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093353]  [&lt;ffffffff811965f4&gt;] ?<br>
&gt; call_rwsem_down_read_failed+0x14/0x30<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093359]  [&lt;ffffffffa028f0bc&gt;] ?<br>
&gt; user_dlm_lock+0x0/0x47 [ocfs2_stack_user]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093363]  [&lt;ffffffff810b885b&gt;] ?<br>
&gt; zone_watermark_ok+0x20/0xb1<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093365]  [&lt;ffffffff812fc665&gt;] ?<br>
&gt; down_read+0x17/0x19<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093371]  [&lt;ffffffffa02133b6&gt;] ?<br>
&gt; dlm_lock+0x56/0x149 [dlm]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093374]  [&lt;ffffffff810c79c0&gt;] ?<br>
&gt; zone_statistics+0x3c/0x5d<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093377]  [&lt;ffffffffa028f0fe&gt;] ?<br>
&gt; user_dlm_lock+0x42/0x47 [ocfs2_stack_user]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093380]  [&lt;ffffffffa028f000&gt;] ?<br>
&gt; fsdlm_lock_ast_wrapper+0x0/0x2d [ocfs2_stack_user]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093382]  [&lt;ffffffffa028f02d&gt;] ?<br>
&gt; fsdlm_blocking_ast_wrapper+0x0/0x17 [ocfs2_stack_user]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093391]  [&lt;ffffffffa031587a&gt;] ?<br>
&gt; __ocfs2_cluster_lock+0x47c/0x8c5 [ocfs2]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093395]  [&lt;ffffffff8100f657&gt;] ?<br>
&gt; __switch_to+0x140/0x297<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093402]  [&lt;ffffffffa0315cd8&gt;] ?<br>
&gt; ocfs2_cluster_lock+0x15/0x17 [ocfs2]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093408]  [&lt;ffffffffa03195c2&gt;] ?<br>
&gt; ocfs2_super_lock+0xc7/0x2a9 [ocfs2]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093415]  [&lt;ffffffffa03195c2&gt;] ?<br>
&gt; ocfs2_super_lock+0xc7/0x2a9 [ocfs2]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093421]  [&lt;ffffffffa0329f9e&gt;] ?<br>
&gt; __ocfs2_recovery_thread+0x0/0x122b [ocfs2]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093428]  [&lt;ffffffffa032a07f&gt;] ?<br>
&gt; __ocfs2_recovery_thread+0xe1/0x122b [ocfs2]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093430]  [&lt;ffffffff812fba90&gt;] ?<br>
&gt; thread_return+0x79/0xe0<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093433]  [&lt;ffffffff8103a403&gt;] ?<br>
&gt; activate_task+0x22/0x28<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093436]  [&lt;ffffffff8104a44f&gt;] ?<br>
&gt; try_to_wake_up+0x289/0x29b<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093443]  [&lt;ffffffffa0329f9e&gt;] ?<br>
&gt; __ocfs2_recovery_thread+0x0/0x122b [ocfs2]<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093446]  [&lt;ffffffff81064d79&gt;] ?<br>
&gt; kthread+0x79/0x81<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093449]  [&lt;ffffffff81011baa&gt;] ?<br>
&gt; child_rip+0xa/0x20<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093451]  [&lt;ffffffff81064d00&gt;] ?<br>
&gt; kthread+0x0/0x81<br>
&gt; Dec  2 02:13:06 appserver01 kernel: [124101.093453]  [&lt;ffffffff81011ba0&gt;] ?<br>
&gt; child_rip+0x0/0x20<br>
&gt;<br>
&gt; Then a few moments passed.<br>
&gt;<br>
&gt; Dec  2 02:13:32 appserver01 kernel: [124127.071151] block drbd1: Handshake<br>
&gt; successful: Agreed network protocol version 91<br>
&gt; Dec  2 02:13:32 appserver01 kernel: [124127.071157] block drbd1: conn(<br>
&gt; WFConnection -&gt; WFReportParams )<br>
&gt; Dec  2 02:13:32 appserver01 kernel: [124127.076732] block drbd1: Starting<br>
&gt; asender thread (from drbd1_receiver [7526])<br>
&gt; Dec  2 02:13:32 appserver01 kernel: [124127.078447] block drbd1:<br>
&gt; data-integrity-alg: &lt;not-used&gt;<br>
&gt; Dec  2 02:13:32 appserver01 kernel: [124127.078456] block drbd1:<br>
&gt; drbd_sync_handshake:<br>
&gt; Dec  2 02:13:32 appserver01 kernel: [124127.078459] block drbd1: self<br>
&gt; 7843E95E721AF0ED:54BC6F3AD7F42585:52FF69A8720BCEAC:BA309D9B7FCA3C07<br>
&gt; bits:115301551 flags:0<br>
&gt; Dec  2 02:13:32 appserver01 kernel: [124127.078461] block drbd1: peer<br>
&gt; 54BC6F3AD7F42584:0000000000000000:0000000000000000:0000000000000000<br>
&gt; bits:115314775 flags:2<br>
&gt; Dec  2 02:13:32 appserver01 kernel: [124127.078464] block drbd1:<br>
&gt; uuid_compare()=1 by rule 70<br>
&gt; Dec  2 02:13:32 appserver01 kernel: [124127.078465] block drbd1: Becoming<br>
&gt; sync source due to disk states.<br>
&gt; Dec  2 02:13:32 appserver01 kernel: [124127.078469] block drbd1: peer(<br>
&gt; Unknown -&gt; Secondary ) conn( WFReportParams -&gt; WFBitMapS )<br>
&gt; Dec  2 02:13:39 appserver01 kernel: [124134.091066] block drbd1: conn(<br>
&gt; WFBitMapS -&gt; SyncSource )<br>
&gt; Dec  2 02:13:39 appserver01 kernel: [124134.091078] block drbd1: Began<br>
&gt; resync as SyncSource (will sync 461259100 KB [115314775 bits set]).<br>
&gt;<br>
&gt; And after yet some more moments passing it started to repeatedly post call<br>
&gt; traces. Here is just one cycle of these traces. At this point the load was<br>
&gt; critical and I must assume the server was unresponsive because the status<br>
&gt; of the alarms didn&#39;t change until manual intervention. It kept posting call<br>
&gt; traces for 4 minutes and then I must assume DRBD died because it was quiet<br>
&gt; until reboot.<br>
&gt;<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996240] ocfs2rec      D<br>
&gt; ffff88017e7fa350     0 26221      2 0x00000000<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996244]  ffff88017e7fa350<br>
&gt; 0000000000000046 ffff88018dad4000 0000000000000010<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996247]  0000000000000616<br>
&gt; ffffea000455c168 000000000000f9e0 ffff88018dad5fd8<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996250]  0000000000015780<br>
&gt; 0000000000015780 ffff88017e266350 ffff88017e266648<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996252] Call Trace:<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996260]  [&lt;ffffffff812fcc4f&gt;] ?<br>
&gt; rwsem_down_failed_common+0x8c/0xa8<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996262]  [&lt;ffffffff812fccb2&gt;] ?<br>
&gt; rwsem_down_read_failed+0x22/0x2b<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996267]  [&lt;ffffffff811965f4&gt;] ?<br>
&gt; call_rwsem_down_read_failed+0x14/0x30<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996273]  [&lt;ffffffffa028f0bc&gt;] ?<br>
&gt; user_dlm_lock+0x0/0x47 [ocfs2_stack_user]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996277]  [&lt;ffffffff810b885b&gt;] ?<br>
&gt; zone_watermark_ok+0x20/0xb1<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996279]  [&lt;ffffffff812fc665&gt;] ?<br>
&gt; down_read+0x17/0x19<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996285]  [&lt;ffffffffa02133b6&gt;] ?<br>
&gt; dlm_lock+0x56/0x149 [dlm]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996289]  [&lt;ffffffff810c79c0&gt;] ?<br>
&gt; zone_statistics+0x3c/0x5d<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996291]  [&lt;ffffffffa028f0fe&gt;] ?<br>
&gt; user_dlm_lock+0x42/0x47 [ocfs2_stack_user]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996294]  [&lt;ffffffffa028f000&gt;] ?<br>
&gt; fsdlm_lock_ast_wrapper+0x0/0x2d [ocfs2_stack_user]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996297]  [&lt;ffffffffa028f02d&gt;] ?<br>
&gt; fsdlm_blocking_ast_wrapper+0x0/0x17 [ocfs2_stack_user]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996305]  [&lt;ffffffffa031587a&gt;] ?<br>
&gt; __ocfs2_cluster_lock+0x47c/0x8c5 [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996310]  [&lt;ffffffff8100f657&gt;] ?<br>
&gt; __switch_to+0x140/0x297<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996317]  [&lt;ffffffffa0315cd8&gt;] ?<br>
&gt; ocfs2_cluster_lock+0x15/0x17 [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996323]  [&lt;ffffffffa03195c2&gt;] ?<br>
&gt; ocfs2_super_lock+0xc7/0x2a9 [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996330]  [&lt;ffffffffa03195c2&gt;] ?<br>
&gt; ocfs2_super_lock+0xc7/0x2a9 [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996337]  [&lt;ffffffffa0329f9e&gt;] ?<br>
&gt; __ocfs2_recovery_thread+0x0/0x122b [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996343]  [&lt;ffffffffa032a07f&gt;] ?<br>
&gt; __ocfs2_recovery_thread+0xe1/0x122b [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996346]  [&lt;ffffffff812fba90&gt;] ?<br>
&gt; thread_return+0x79/0xe0<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996349]  [&lt;ffffffff8103a403&gt;] ?<br>
&gt; activate_task+0x22/0x28<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996352]  [&lt;ffffffff8104a44f&gt;] ?<br>
&gt; try_to_wake_up+0x289/0x29b<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996359]  [&lt;ffffffffa0329f9e&gt;] ?<br>
&gt; __ocfs2_recovery_thread+0x0/0x122b [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996362]  [&lt;ffffffff81064d79&gt;] ?<br>
&gt; kthread+0x79/0x81<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996364]  [&lt;ffffffff81011baa&gt;] ?<br>
&gt; child_rip+0xa/0x20<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996366]  [&lt;ffffffff81064d00&gt;] ?<br>
&gt; kthread+0x0/0x81<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996368]  [&lt;ffffffff81011ba0&gt;] ?<br>
&gt; child_rip+0x0/0x20<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996556] ls            D<br>
&gt; ffff8801bb5a2a60     0 26318  26317 0x00000000<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996559]  ffff8801bb5a2a60<br>
&gt; 0000000000000082 ffff8801bb7734c8 ffffffff81103ab9<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996561]  ffff88016843dd58<br>
&gt; ffff88016843ddf8 000000000000f9e0 ffff88016843dfd8<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996563]  0000000000015780<br>
&gt; 0000000000015780 ffff8801bcf1a350 ffff8801bcf1a648<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996566] Call Trace:<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996570]  [&lt;ffffffff81103ab9&gt;] ?<br>
&gt; mntput_no_expire+0x23/0xee<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996573]  [&lt;ffffffff810f75af&gt;] ?<br>
&gt; __link_path_walk+0x6f0/0x6f5<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996580]  [&lt;ffffffffa03296af&gt;] ?<br>
&gt; ocfs2_wait_for_recovery+0x9d/0xb7 [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996582]  [&lt;ffffffff81065046&gt;] ?<br>
&gt; autoremove_wake_function+0x0/0x2e<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996589]  [&lt;ffffffffa0319923&gt;] ?<br>
&gt; ocfs2_inode_lock_full_nested+0x16b/0xb2c [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996596]  [&lt;ffffffffa0324f2d&gt;] ?<br>
&gt; ocfs2_inode_revalidate+0x145/0x221 [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996603]  [&lt;ffffffffa03208d9&gt;] ?<br>
&gt; ocfs2_getattr+0x79/0x16a [ocfs2]<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996606]  [&lt;ffffffff810f2591&gt;] ?<br>
&gt; vfs_fstatat+0x43/0x57<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996609]  [&lt;ffffffff810f25fb&gt;] ?<br>
&gt; sys_newlstat+0x11/0x30<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996612]  [&lt;ffffffff812ff306&gt;] ?<br>
&gt; do_page_fault+0x2e0/0x2fc<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996614]  [&lt;ffffffff812fd1a5&gt;] ?<br>
&gt; page_fault+0x25/0x30<br>
&gt; Dec  2 02:15:06 appserver01 kernel: [124220.996616]  [&lt;ffffffff81010b42&gt;] ?<br>
&gt; system_call_fastpath+0x16/0x1b<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899149] events/0      D<br>
&gt; ffff88017e7faa60     0     6      2 0x00000000<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899153]  ffff88017e7faa60<br>
&gt; 0000000000000046 ffff880006e157e8 ffff8801bf09e388<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899157]  ffff8801bc88f1b8<br>
&gt; ffff8801bc88f1a8 000000000000f9e0 ffff8801bf0b3fd8<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899160]  0000000000015780<br>
&gt; 0000000000015780 ffff8801bf09e350 ffff8801bf09e648<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899162] Call Trace:<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899169]  [&lt;ffffffff812fba90&gt;] ?<br>
&gt; thread_return+0x79/0xe0<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899172]  [&lt;ffffffff812fcc4f&gt;] ?<br>
&gt; rwsem_down_failed_common+0x8c/0xa8<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899175]  [&lt;ffffffff812fccb2&gt;] ?<br>
&gt; rwsem_down_read_failed+0x22/0x2b<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899179]  [&lt;ffffffff811965f4&gt;] ?<br>
&gt; call_rwsem_down_read_failed+0x14/0x30<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899185]  [&lt;ffffffffa028f0bc&gt;] ?<br>
&gt; user_dlm_lock+0x0/0x47 [ocfs2_stack_user]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899188]  [&lt;ffffffff812fc665&gt;] ?<br>
&gt; down_read+0x17/0x19<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899193]  [&lt;ffffffffa02133b6&gt;] ?<br>
&gt; dlm_lock+0x56/0x149 [dlm]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899198]  [&lt;ffffffff810168c1&gt;] ?<br>
&gt; sched_clock+0x5/0x8<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899202]  [&lt;ffffffff81049412&gt;] ?<br>
&gt; update_rq_clock+0xf/0x28<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899205]  [&lt;ffffffff8104a44f&gt;] ?<br>
&gt; try_to_wake_up+0x289/0x29b<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899209]  [&lt;ffffffff810fd0ce&gt;] ?<br>
&gt; pollwake+0x53/0x59<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899211]  [&lt;ffffffff8104a461&gt;] ?<br>
&gt; default_wake_function+0x0/0x9<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899214]  [&lt;ffffffffa028f0fe&gt;] ?<br>
&gt; user_dlm_lock+0x42/0x47 [ocfs2_stack_user]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899217]  [&lt;ffffffffa028f000&gt;] ?<br>
&gt; fsdlm_lock_ast_wrapper+0x0/0x2d [ocfs2_stack_user]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899219]  [&lt;ffffffffa028f02d&gt;] ?<br>
&gt; fsdlm_blocking_ast_wrapper+0x0/0x17 [ocfs2_stack_user]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899228]  [&lt;ffffffffa031587a&gt;] ?<br>
&gt; __ocfs2_cluster_lock+0x47c/0x8c5 [ocfs2]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899231]  [&lt;ffffffff812fba90&gt;] ?<br>
&gt; thread_return+0x79/0xe0<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899237]  [&lt;ffffffffa0315cd8&gt;] ?<br>
&gt; ocfs2_cluster_lock+0x15/0x17 [ocfs2]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899244]  [&lt;ffffffffa0317472&gt;] ?<br>
&gt; ocfs2_orphan_scan_lock+0x5d/0xa8 [ocfs2]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899250]  [&lt;ffffffffa0317472&gt;] ?<br>
&gt; ocfs2_orphan_scan_lock+0x5d/0xa8 [ocfs2]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899257]  [&lt;ffffffffa0328abe&gt;] ?<br>
&gt; ocfs2_queue_orphan_scan+0x29/0x126 [ocfs2]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899259]  [&lt;ffffffff812fc3c6&gt;] ?<br>
&gt; mutex_lock+0xd/0x31<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899266]  [&lt;ffffffffa0328be0&gt;] ?<br>
&gt; ocfs2_orphan_scan_work+0x25/0x4d [ocfs2]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899270]  [&lt;ffffffff81061a13&gt;] ?<br>
&gt; worker_thread+0x188/0x21d<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899276]  [&lt;ffffffffa0328bbb&gt;] ?<br>
&gt; ocfs2_orphan_scan_work+0x0/0x4d [ocfs2]<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899280]  [&lt;ffffffff81065046&gt;] ?<br>
&gt; autoremove_wake_function+0x0/0x2e<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899282]  [&lt;ffffffff8106188b&gt;] ?<br>
&gt; worker_thread+0x0/0x21d<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899284]  [&lt;ffffffff81064d79&gt;] ?<br>
&gt; kthread+0x79/0x81<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899287]  [&lt;ffffffff81011baa&gt;] ?<br>
&gt; child_rip+0xa/0x20<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899289]  [&lt;ffffffff81064d00&gt;] ?<br>
&gt; kthread+0x0/0x81<br>
&gt; Dec  2 02:17:06 appserver01 kernel: [124340.899291]  [&lt;ffffffff81011ba0&gt;] ?<br>
&gt; child_rip+0x0/0x20<br>
&gt;<br>
&gt; --<br>
&gt; Hälsningar / Greetings<br>
&gt;<br>
&gt; Stefan Midjich<br>
&gt; [De omnibus dubitandum]<br>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
: Lars Ellenberg<br>
: LINBIT | Your Way to High Availability<br>
: DRBD/HA support and consulting <a href="http://www.linbit.com" target="_blank">http://www.linbit.com</a><br>
<br>
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.<br>
__<br>
please don&#39;t Cc me, but send to list   --   I&#39;m subscribed<br>
_______________________________________________<br>
drbd-user mailing list<br>
<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
</font></span></blockquote></div><br><br clear="all"><br>-- <br>Hälsningar / Greetings<br><br>Stefan Midjich<br>[De omnibus dubitandum]<br>
</div>