[DRBD-user] Stacked DRBD device hangs

Christoph Mitasch cmitasch at thomas-krenn.com
Wed Dec 10 11:16:07 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

we recently had an issue with a stacked DRBD device (8.3.16) that started to block IO after switching from Ahead to SyncSource.
ko-count is set to 6.

Dec  8 03:35:08 node2 kernel: [668315.119697] block drbd20: helper command: /sbin/drbdadm before-resync-source minor-20 exit code 0 (0x0)
Dec  8 03:35:08 node2 kernel: [668315.119706] block drbd20: conn( Ahead -> SyncSource ) pdsk( Consistent -> Inconsistent ) 
Dec  8 03:35:08 node2 kernel: [668315.119716] block drbd20: ASSERT( !(remote && send_oos) ) in /var/lib/dkms/drbd/8.3.16/build/drbd/drbd_req.c:1001
Dec  8 03:35:08 node2 kernel: [668315.119729] block drbd20: Began resync as SyncSource (will sync 216 KB [54 bits set]).
Dec  8 03:35:08 node2 kernel: [668315.120419] block drbd20: updated sync UUID 024B346E4B84E12B:86C8E56E6CD2BBDC:9D97BCB66EBE838D:3E5876F017C7CDBD
Dec  8 03:35:49 node2 kernel: [668356.840611] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0)
Dec  8 03:35:49 node2 kernel: [668356.865459] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0)
Dec  8 03:35:49 node2 kernel: [668356.903126] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0)
Dec  8 03:35:49 node2 kernel: [668356.930498] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0)
Dec  8 03:36:00 node2 kernel: [668367.006241] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0)
Dec  8 03:36:00 node2 kernel: [668367.030987] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0)
Dec  8 03:36:10 node2 kernel: [668377.249395] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0)
Dec  8 03:36:13 node2 kernel: [668380.608957] block drbd20: Remote failed to finish a request within ko-count * timeout
Dec  8 03:36:13 node2 kernel: [668380.632397] block drbd20: peer( Secondary -> Unknown ) conn( SyncSource -> Timeout ) 
Dec  8 03:36:13 node2 kernel: [668380.632440] block drbd20: error receiving CsumRSRequest, l: 44!
Dec  8 03:36:13 node2 kernel: [668380.645119] block drbd20: asender terminated
Dec  8 03:36:13 node2 kernel: [668380.645131] block drbd20: Terminating drbd20_asender
Dec  8 03:37:32 node2 kernel: [668459.482874] INFO: task jbd2/dm-4-8:9503 blocked for more than 120 seconds.
Dec  8 03:37:32 node2 kernel: [668459.494628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  8 03:37:32 node2 kernel: [668459.518881] jbd2/dm-4-8     D ffffffff81806240     0  9503      2 0x00000000
Dec  8 03:37:32 node2 kernel: [668459.518888]  ffff881017c57ac0 0000000000000046 ffff881017c57a60 ffffffff8103ec29
Dec  8 03:37:32 node2 kernel: [668459.542394]  ffff881017c57fd8 ffff881017c57fd8 ffff881017c57fd8 00000000000137c0
Dec  8 03:37:32 node2 kernel: [668459.565602]  ffff8810197b4500 ffff88100a612e00 ffff881017c57a90 ffff88207fcb4080
Dec  8 03:37:32 node2 kernel: [668459.588736] Call Trace:

Is this a known problem and fixed in DRBD 8.4?

Thank you,
Christoph



More information about the drbd-user mailing list