[DRBD-user] A question about drbd hang in conn_disconnect

Fang Sun sunf2002 at gmail.com
Thu Apr 9 20:56:13 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi drbd team,

I am running into a drbd problem recently and I hope I can get some help
from you.

This problem can be reproduced in 8.4.4,8.4.5 and 8.4.6.

I have a 2 nodes cluster. There are 2 disks. One disk is upToDate and the
other is syncing.
I cut the network on standby when one disk is syncing.
I configured fencing=resource-and-stonith, and I expect my drbd fencing is
called when network is shutdown. This always works as expected if both disk
are UpToDate when I shutdown network on standby.
But when one disk is syncing this caused the drbd to suspend both disks and
drbd fencing isn't called. And I can see drbd read process is put into D
state.

Some logs on primary:

Apr  6 16:53:49 shrvm219 kernel: drbd cic: PingAck did not arrive in time.

Apr  6 16:53:49 shrvm219 kernel: block drbd1: conn( SyncSource ->
NetworkFailure )

Apr  6 16:53:49 shrvm219 kernel: block drbd2: conn( Connected ->
NetworkFailure ) pdsk( UpToDate -> DUnknown )

Apr  6 16:53:49 shrvm219 kernel: drbd cic: peer( Secondary -> Unknown )
susp( 0 -> 1 )

Apr  6 16:53:49 shrvm219 kernel: drbd cic: susp( 0 -> 1 )

Apr  6 16:53:49 shrvm219 kernel: drbd cic: asender terminated

Apr  6 16:53:49 shrvm219 kernel: drbd cic: Terminating drbd_a_cic

>>>There isn’t Connection closed

Apr  6 16:57:28 shrvm220 kernel: INFO: task xfsalloc/0:805 blocked for more
than 120 seconds.



Apr  3 20:57:36 shrvm220 kernel: INFO: task xfsaild/drbd2:16778 blocked for
more than 120 seconds.


[root at shrvm219 ~]# ps -ax | grep drbd

Warning: bad syntax, perhaps a bogus '-'? See
/usr/share/doc/procps-3.2.8/FAQ

5683 ?        S      0:00 [drbd-reissue]

11386 pts/0    S+     0:00 grep drbd

12098 ?        S      0:00 [drbd_submit]

12103 ?        S      0:00 [drbd_submit]

12118 ?        S      0:02 [drbd_w_cic]

12139 ?        D      0:03 [drbd_r_cic]

12847 ?        S      0:00 [xfsbufd/drbd2]

12848 ?        S      0:00 [xfs-cil/drbd2]

12849 ?        D      0:00 [xfssyncd/drbd2]

12850 ?        S      0:02 [xfsaild/drbd2]

13359 ?        S      0:00 [xfsbufd/drbd1]

13360 ?        S      0:00 [xfs-cil/drbd1]

13361 ?        D      0:00 [xfssyncd/drbd1]

13362 ?        S      0:02 [xfsaild/drbd1]


This is part of kernel stack, as I thought drbd is stuck at
conn_disconnect.



Apr  7 19:34:08 shrvm219 kernel: SysRq : Show Blocked State

Apr  7 19:34:08 shrvm219 kernel:  task                        PC stack
pid father

Apr  7 19:34:08 shrvm219 kernel: drbd_r_cic    D 0000000000000000     0
12139      2 0x00000084

Apr  7 19:34:08 shrvm219 kernel: ffff88023886fd90 0000000000000046
0000000000000000 ffff88023886fd54

Apr  7 19:34:08 shrvm219 kernel: ffff88023886fd20 ffff88023fc23040
000002b2f338c2ce ffff8800283158c0

Apr  7 19:34:08 shrvm219 kernel: 00000000000005ff 000000010028bb73
ffff88023ab87058 ffff88023886ffd8

Apr  7 19:34:08 shrvm219 kernel: Call Trace:

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8109ee2e>] ?
prepare_to_wait+0x4e/0x80

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa0394d4d>]
conn_disconnect+0x22d/0x4f0 [drbd]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8109eb00>] ?
autoremove_wake_function+0x0/0x40

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa0395120>]
drbd_receiver+0x110/0x220 [drbd]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa03a69e0>] ?
drbd_thread_setup+0x0/0x110 [drbd]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa03a6a0d>]
drbd_thread_setup+0x2d/0x110 [drbd]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa03a69e0>] ?
drbd_thread_setup+0x0/0x110 [drbd]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20

Apr  7 19:34:08 shrvm219 kernel: xfssyncd/drbd D 0000000000000001     0
12849      2 0x00000080

Apr  7 19:34:08 shrvm219 kernel: ffff880203387ad0 0000000000000046
0000000000000000 ffff880203387a94

Apr  7 19:34:08 shrvm219 kernel: ffff880203387a30 ffff88023fc23040
000002b60ef7decd ffff8800283158c0

Apr  7 19:34:08 shrvm219 kernel: 0000000000000400 000000010028ef9d
ffff88023921bab8 ffff880203387fd8

Apr  7 19:34:08 shrvm219 kernel: Call Trace:

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa039c947>]
drbd_make_request+0x197/0x330 [drbd]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8109eb00>] ?
autoremove_wake_function+0x0/0x40

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff81270810>]
generic_make_request+0x240/0x5a0

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa039cbe9>] ?
drbd_merge_bvec+0x109/0x2a0 [drbd]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff81270be0>] submit_bio+0x70/0x120

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff81064b90>] ?
default_wake_function+0x0/0x20

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa0208bba>]
_xfs_buf_ioapply+0x16a/0x200 [xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa01ef50a>] ?
xlog_bdstrat+0x2a/0x60 [xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa020a87f>]
xfs_buf_iorequest+0x4f/0xe0 [xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa01ef50a>]
xlog_bdstrat+0x2a/0x60 [xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa01f0ce9>] xlog_sync+0x269/0x3e0
[xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa01f0f13>]
xlog_state_release_iclog+0xb3/0xf0 [xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa01f13a2>]
_xfs_log_force+0x122/0x240 [xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa01f1688>]
xfs_log_force+0x38/0x90 [xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa0214a02>]
xfs_sync_worker+0x52/0xa0 [xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa021491e>] xfssyncd+0x17e/0x210
[xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa02147a0>] ? xfssyncd+0x0/0x210
[xfs]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20

Apr  7 19:34:08 shrvm219 kernel: xfssyncd/drbd D 0000000000000001     0
13361      2 0x00000080

Apr  7 19:34:08 shrvm219 kernel: ffff8802033edad0 0000000000000046
0000000000000000 ffff8802033eda94

Apr  7 19:34:08 shrvm219 kernel: 0000000000000000 ffff88023fc23040
000002b60ef68ea3 ffff8800283158c0

Apr  7 19:34:08 shrvm219 kernel: 00000000000007fe 000000010028ef9d
ffff880239c41058 ffff8802033edfd8

Apr  7 19:34:08 shrvm219 kernel: Call Trace:

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa039c947>]
drbd_make_request+0x197/0x330 [drbd]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff8109eb00>] ?
autoremove_wake_function+0x0/0x40

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff81270810>]
generic_make_request+0x240/0x5a0

Apr  7 19:34:08 shrvm219 kernel: [<ffffffffa039cbe9>] ?
drbd_merge_bvec+0x109/0x2a0 [drbd]

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff81270be0>] submit_bio+0x70/0x120

Apr  7 19:34:08 shrvm219 kernel: [<ffffffff81064b90>] ?
default_wake_function+0x0/0x20


Thanks a lot in advance.


Fang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150409/28768f6f/attachment.htm>


More information about the drbd-user mailing list