<div dir="ltr">Hi drbd team,<div><br></div><div>I am running into a drbd problem recently and I hope I can get some help from you.</div><div><br></div><div>This problem can be reproduced in 8.4.4,8.4.5 and 8.4.6.</div><div><br></div><div>I have a 2 nodes cluster. There are 2 disks. One disk is upToDate and the other is syncing.</div><div>I cut the network on standby when one disk is syncing. </div><div>I configured fencing=resource-and-stonith, and I expect my drbd fencing is called when network is shutdown. This always works as expected if both disk are UpToDate when I shutdown network on standby.</div><div>But when one disk is syncing this caused the drbd to suspend both disks and drbd fencing isn&#39;t called. And I can see drbd read process is put into D state.</div><div><br></div><div>Some logs on primary:</div><div><p class="MsoNormal" style="margin-left:0.5in"><span style="color:black">Apr 
6 16:53:49 shrvm219 kernel: drbd cic: PingAck did not arrive in time.</span></p>

<p class="MsoNormal" style="margin-left:0.5in"><span style="color:black">Apr 
6 16:53:49 shrvm219 kernel: block drbd1: conn( SyncSource -&gt; NetworkFailure
)</span></p>

<p class="MsoNormal" style="margin-left:0.5in"><span style="color:black">Apr 
6 16:53:49 shrvm219 kernel: block drbd2: conn( Connected -&gt; NetworkFailure )
pdsk( UpToDate -&gt; DUnknown )</span></p>

<p class="MsoNormal" style="margin-left:0.5in"><span style="color:black">Apr 
6 16:53:49 shrvm219 kernel: drbd cic: peer( Secondary -&gt; Unknown ) susp( 0
-&gt; 1 )</span></p>

<p class="MsoNormal" style="margin-left:0.5in"><span style="color:black">Apr 
6 16:53:49 shrvm219 kernel: drbd cic: susp( 0 -&gt; 1 )</span></p>

<p class="MsoNormal" style="margin-left:0.5in"><span style="color:black">Apr 
6 16:53:49 shrvm219 kernel: drbd cic: asender terminated</span></p>

<p class="MsoNormal" style="margin-left:0.5in"><span style="color:black">Apr 
6 16:53:49 shrvm219 kernel: drbd cic: Terminating drbd_a_cic</span></p>

<p class="MsoNormal" style="margin-left:0.5in"><span style="color:red">&gt;&gt;&gt;There
isn’t Connection closed</span></p>

<p class="MsoNormal" style="margin-left:0.5in"><span style="color:black">Apr 
6 16:57:28 shrvm220 kernel: INFO: task xfsalloc/0:805 blocked for more than 120
seconds.</span></p>

<p class="MsoNormal" style="margin-left:0.5in"><span style="color:black"> </span></p>

<p class="MsoNormal" style="margin-left:0.5in"><span style="color:black">Apr 
3 20:57:36 shrvm220 kernel: INFO: task xfsaild/drbd2:16778 blocked for more
than 120 seconds.</span></p><p class="MsoNormal" style="margin-left:0.5in"><br></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">[root@shrvm219 ~]# ps -ax | grep drbd</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">Warning: bad syntax, perhaps a bogus &#39;-&#39;? See
/usr/share/doc/procps-3.2.8/FAQ</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">5683 ?       
S      0:00 [drbd-reissue]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">11386 pts/0   
S+     0:00 grep drbd</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">12098 ?       
S      0:00 [drbd_submit]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">12103 ?       
S      0:00 [drbd_submit]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">12118 ?       
S      0:02 [drbd_w_cic]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">12139 ?       
D      0:03 [drbd_r_cic]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">12847 ?       
S      0:00 [xfsbufd/drbd2]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">12848 ?      
 S      0:00 [xfs-cil/drbd2]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">12849 ?       
D      0:00 [xfssyncd/drbd2]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">12850 ?       
S      0:02 [xfsaild/drbd2]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">13359 ?       
S      0:00 [xfsbufd/drbd1]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">13360 ?       
S      0:00 [xfs-cil/drbd1]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">13361 ?       
D      0:00 [xfssyncd/drbd1]</span></p><p class="MsoNormal" style="margin-left:0.5in">





























</p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;">13362 ?  
     S      0:02
[xfsaild/drbd1]</span></p><p class="MsoNormal"><span style="font-size:9pt;font-family:&#39;Lucida Console&#39;"><br></span></p><p class="MsoNormal"><span style="color:rgb(31,73,125)">This is part of kernel stack, as
I thought drbd is stuck at conn_disconnect. </span></p><p class="MsoNormal"><span style="color:rgb(31,73,125)"> </span></p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: SysRq : Show Blocked
State</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: 
task                       
PC stack   pid father</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: <span style="color:rgb(192,0,0)">drbd_r_cic    </span>D
0000000000000000     0 12139 
    2 0x00000084</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: ffff88023886fd90
0000000000000046 0000000000000000 ffff88023886fd54</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: ffff88023886fd20
ffff88023fc23040 000002b2f338c2ce ffff8800283158c0</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: 00000000000005ff
000000010028bb73 ffff88023ab87058 ffff88023886ffd8</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: Call Trace:</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff8109ee2e&gt;] ? prepare_to_wait+0x4e/0x80</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa0394d4d&gt;] conn_disconnect+0x22d/0x4f0 [drbd]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: [&lt;ffffffff8109eb00&gt;]
? autoremove_wake_function+0x0/0x40</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa0395120&gt;] drbd_receiver+0x110/0x220 [drbd]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa03a69e0&gt;] ? drbd_thread_setup+0x0/0x110 [drbd]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa03a6a0d&gt;] drbd_thread_setup+0x2d/0x110 [drbd]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa03a69e0&gt;] ? drbd_thread_setup+0x0/0x110 [drbd]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff8109e66e&gt;] kthread+0x9e/0xc0</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff8100c20a&gt;] child_rip+0xa/0x20</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff8109e5d0&gt;] ? kthread+0x0/0xc0</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff8100c200&gt;] ? child_rip+0x0/0x20</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: <span style="color:rgb(192,0,0)">xfssyncd/drbd </span>D
0000000000000001     0 12849     
2 0x00000080</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: ffff880203387ad0
0000000000000046 0000000000000000 ffff880203387a94</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: ffff880203387a30
ffff88023fc23040 000002b60ef7decd ffff8800283158c0</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: 0000000000000400
000000010028ef9d ffff88023921bab8 ffff880203387fd8</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: Call Trace:</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa039c947&gt;] drbd_make_request+0x197/0x330 [drbd]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff8109eb00&gt;] ? autoremove_wake_function+0x0/0x40</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff81270810&gt;] generic_make_request+0x240/0x5a0</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa039cbe9&gt;] ? drbd_merge_bvec+0x109/0x2a0 [drbd]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff81270be0&gt;] submit_bio+0x70/0x120</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff81064b90&gt;] ? default_wake_function+0x0/0x20</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa0208bba&gt;] _xfs_buf_ioapply+0x16a/0x200 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa01ef50a&gt;] ? xlog_bdstrat+0x2a/0x60 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa020a87f&gt;] xfs_buf_iorequest+0x4f/0xe0 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa01ef50a&gt;] xlog_bdstrat+0x2a/0x60 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa01f0ce9&gt;] xlog_sync+0x269/0x3e0 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa01f0f13&gt;] xlog_state_release_iclog+0xb3/0xf0 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa01f13a2&gt;] _xfs_log_force+0x122/0x240 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa01f1688&gt;] xfs_log_force+0x38/0x90 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa0214a02&gt;] xfs_sync_worker+0x52/0xa0 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa021491e&gt;] xfssyncd+0x17e/0x210 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa02147a0&gt;] ? xfssyncd+0x0/0x210 [xfs]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff8109e66e&gt;] kthread+0x9e/0xc0</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff8100c20a&gt;] child_rip+0xa/0x20</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: [&lt;ffffffff8109e5d0&gt;]
? kthread+0x0/0xc0</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff8100c200&gt;] ? child_rip+0x0/0x20</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel<span style="color:rgb(192,0,0)">:
xfssyncd/drbd </span>D 0000000000000001     0
13361      2 0x00000080</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: ffff8802033edad0
0000000000000046 0000000000000000 ffff8802033eda94</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: 0000000000000000
ffff88023fc23040 000002b60ef68ea3 ffff8800283158c0</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: 00000000000007fe
000000010028ef9d ffff880239c41058 ffff8802033edfd8</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel: Call Trace:</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa039c947&gt;] drbd_make_request+0x197/0x330 [drbd]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff8109eb00&gt;] ? autoremove_wake_function+0x0/0x40</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff81270810&gt;] generic_make_request+0x240/0x5a0</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffffa039cbe9&gt;] ? drbd_merge_bvec+0x109/0x2a0 [drbd]</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff81270be0&gt;] submit_bio+0x70/0x120</p><p class="MsoNormal">















































































































</p><p class="MsoNormal">Apr  7 19:34:08 shrvm219 kernel:
[&lt;ffffffff81064b90&gt;] ? default_wake_function+0x0/0x20</p><p class="MsoNormal"><br></p><p class="MsoNormal">Thanks a lot in advance.</p><p class="MsoNormal"><br></p><p class="MsoNormal">Fang</p></div></div>