Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi drbd team, I am running into a drbd problem recently and I hope I can get some help from you. This problem can be reproduced in 8.4.4,8.4.5 and 8.4.6. I have a 2 nodes cluster. There are 2 disks. One disk is upToDate and the other is syncing. I cut the network on standby when one disk is syncing. I configured fencing=resource-and-stonith, and I expect my drbd fencing is called when network is shutdown. This always works as expected if both disk are UpToDate when I shutdown network on standby. But when one disk is syncing this caused the drbd to suspend both disks and drbd fencing isn't called. And I can see drbd read process is put into D state. Some logs on primary: Apr 6 16:53:49 shrvm219 kernel: drbd cic: PingAck did not arrive in time. Apr 6 16:53:49 shrvm219 kernel: block drbd1: conn( SyncSource -> NetworkFailure ) Apr 6 16:53:49 shrvm219 kernel: block drbd2: conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Apr 6 16:53:49 shrvm219 kernel: drbd cic: peer( Secondary -> Unknown ) susp( 0 -> 1 ) Apr 6 16:53:49 shrvm219 kernel: drbd cic: susp( 0 -> 1 ) Apr 6 16:53:49 shrvm219 kernel: drbd cic: asender terminated Apr 6 16:53:49 shrvm219 kernel: drbd cic: Terminating drbd_a_cic >>>There isn’t Connection closed Apr 6 16:57:28 shrvm220 kernel: INFO: task xfsalloc/0:805 blocked for more than 120 seconds. Apr 3 20:57:36 shrvm220 kernel: INFO: task xfsaild/drbd2:16778 blocked for more than 120 seconds. [root at shrvm219 ~]# ps -ax | grep drbd Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ 5683 ? S 0:00 [drbd-reissue] 11386 pts/0 S+ 0:00 grep drbd 12098 ? S 0:00 [drbd_submit] 12103 ? S 0:00 [drbd_submit] 12118 ? S 0:02 [drbd_w_cic] 12139 ? D 0:03 [drbd_r_cic] 12847 ? S 0:00 [xfsbufd/drbd2] 12848 ? S 0:00 [xfs-cil/drbd2] 12849 ? D 0:00 [xfssyncd/drbd2] 12850 ? S 0:02 [xfsaild/drbd2] 13359 ? S 0:00 [xfsbufd/drbd1] 13360 ? S 0:00 [xfs-cil/drbd1] 13361 ? D 0:00 [xfssyncd/drbd1] 13362 ? S 0:02 [xfsaild/drbd1] This is part of kernel stack, as I thought drbd is stuck at conn_disconnect. Apr 7 19:34:08 shrvm219 kernel: SysRq : Show Blocked State Apr 7 19:34:08 shrvm219 kernel: task PC stack pid father Apr 7 19:34:08 shrvm219 kernel: drbd_r_cic D 0000000000000000 0 12139 2 0x00000084 Apr 7 19:34:08 shrvm219 kernel: ffff88023886fd90 0000000000000046 0000000000000000 ffff88023886fd54 Apr 7 19:34:08 shrvm219 kernel: ffff88023886fd20 ffff88023fc23040 000002b2f338c2ce ffff8800283158c0 Apr 7 19:34:08 shrvm219 kernel: 00000000000005ff 000000010028bb73 ffff88023ab87058 ffff88023886ffd8 Apr 7 19:34:08 shrvm219 kernel: Call Trace: Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8109ee2e>] ? prepare_to_wait+0x4e/0x80 Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa0394d4d>] conn_disconnect+0x22d/0x4f0 [drbd] Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa0395120>] drbd_receiver+0x110/0x220 [drbd] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa03a69e0>] ? drbd_thread_setup+0x0/0x110 [drbd] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa03a6a0d>] drbd_thread_setup+0x2d/0x110 [drbd] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa03a69e0>] ? drbd_thread_setup+0x0/0x110 [drbd] Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0 Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Apr 7 19:34:08 shrvm219 kernel: xfssyncd/drbd D 0000000000000001 0 12849 2 0x00000080 Apr 7 19:34:08 shrvm219 kernel: ffff880203387ad0 0000000000000046 0000000000000000 ffff880203387a94 Apr 7 19:34:08 shrvm219 kernel: ffff880203387a30 ffff88023fc23040 000002b60ef7decd ffff8800283158c0 Apr 7 19:34:08 shrvm219 kernel: 0000000000000400 000000010028ef9d ffff88023921bab8 ffff880203387fd8 Apr 7 19:34:08 shrvm219 kernel: Call Trace: Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa039c947>] drbd_make_request+0x197/0x330 [drbd] Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Apr 7 19:34:08 shrvm219 kernel: [<ffffffff81270810>] generic_make_request+0x240/0x5a0 Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa039cbe9>] ? drbd_merge_bvec+0x109/0x2a0 [drbd] Apr 7 19:34:08 shrvm219 kernel: [<ffffffff81270be0>] submit_bio+0x70/0x120 Apr 7 19:34:08 shrvm219 kernel: [<ffffffff81064b90>] ? default_wake_function+0x0/0x20 Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa0208bba>] _xfs_buf_ioapply+0x16a/0x200 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa01ef50a>] ? xlog_bdstrat+0x2a/0x60 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa020a87f>] xfs_buf_iorequest+0x4f/0xe0 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa01ef50a>] xlog_bdstrat+0x2a/0x60 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa01f0ce9>] xlog_sync+0x269/0x3e0 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa01f0f13>] xlog_state_release_iclog+0xb3/0xf0 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa01f13a2>] _xfs_log_force+0x122/0x240 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa01f1688>] xfs_log_force+0x38/0x90 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa0214a02>] xfs_sync_worker+0x52/0xa0 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa021491e>] xfssyncd+0x17e/0x210 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa02147a0>] ? xfssyncd+0x0/0x210 [xfs] Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0 Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Apr 7 19:34:08 shrvm219 kernel: xfssyncd/drbd D 0000000000000001 0 13361 2 0x00000080 Apr 7 19:34:08 shrvm219 kernel: ffff8802033edad0 0000000000000046 0000000000000000 ffff8802033eda94 Apr 7 19:34:08 shrvm219 kernel: 0000000000000000 ffff88023fc23040 000002b60ef68ea3 ffff8800283158c0 Apr 7 19:34:08 shrvm219 kernel: 00000000000007fe 000000010028ef9d ffff880239c41058 ffff8802033edfd8 Apr 7 19:34:08 shrvm219 kernel: Call Trace: Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa039c947>] drbd_make_request+0x197/0x330 [drbd] Apr 7 19:34:08 shrvm219 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Apr 7 19:34:08 shrvm219 kernel: [<ffffffff81270810>] generic_make_request+0x240/0x5a0 Apr 7 19:34:08 shrvm219 kernel: [<ffffffffa039cbe9>] ? drbd_merge_bvec+0x109/0x2a0 [drbd] Apr 7 19:34:08 shrvm219 kernel: [<ffffffff81270be0>] submit_bio+0x70/0x120 Apr 7 19:34:08 shrvm219 kernel: [<ffffffff81064b90>] ? default_wake_function+0x0/0x20 Thanks a lot in advance. Fang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150409/28768f6f/attachment.htm>