Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Dec 10, 2014 at 11:16:07AM +0100, Christoph Mitasch wrote: > Hi, > > we recently had an issue with a stacked DRBD device (8.3.16) that > started to block IO after switching from Ahead to SyncSource. > ko-count is set to 6. > > Dec 8 03:35:08 node2 kernel: [668315.119697] block drbd20: helper command: /sbin/drbdadm before-resync-source minor-20 exit code 0 (0x0) > Dec 8 03:35:08 node2 kernel: [668315.119706] block drbd20: conn( Ahead -> SyncSource ) pdsk( Consistent -> Inconsistent ) > Dec 8 03:35:08 node2 kernel: [668315.119716] block drbd20: ASSERT( !(remote && send_oos) ) in /var/lib/dkms/drbd/8.3.16/build/drbd/drbd_req.c:1001 > Dec 8 03:35:08 node2 kernel: [668315.119729] block drbd20: Began resync as SyncSource (will sync 216 KB [54 bits set]). > Dec 8 03:35:08 node2 kernel: [668315.120419] block drbd20: updated sync UUID 024B346E4B84E12B:86C8E56E6CD2BBDC:9D97BCB66EBE838D:3E5876F017C7CDBD > Dec 8 03:35:49 node2 kernel: [668356.840611] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:35:49 node2 kernel: [668356.865459] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:35:49 node2 kernel: [668356.903126] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:35:49 node2 kernel: [668356.930498] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:36:00 node2 kernel: [668367.006241] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:36:00 node2 kernel: [668367.030987] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:36:10 node2 kernel: [668377.249395] block drbd20: cs:SyncSource rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:36:13 node2 kernel: [668380.608957] block drbd20: Remote failed to finish a request within ko-count * timeout > Dec 8 03:36:13 node2 kernel: [668380.632397] block drbd20: peer( Secondary -> Unknown ) conn( SyncSource -> Timeout ) > Dec 8 03:36:13 node2 kernel: [668380.632440] block drbd20: error receiving CsumRSRequest, l: 44! > Dec 8 03:36:13 node2 kernel: [668380.645119] block drbd20: asender terminated > Dec 8 03:36:13 node2 kernel: [668380.645131] block drbd20: Terminating drbd20_asender > Dec 8 03:37:32 node2 kernel: [668459.482874] INFO: task jbd2/dm-4-8:9503 blocked for more than 120 seconds. > Dec 8 03:37:32 node2 kernel: [668459.494628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Dec 8 03:37:32 node2 kernel: [668459.518881] jbd2/dm-4-8 D ffffffff81806240 0 9503 2 0x00000000 > Dec 8 03:37:32 node2 kernel: [668459.518888] ffff881017c57ac0 0000000000000046 ffff881017c57a60 ffffffff8103ec29 > Dec 8 03:37:32 node2 kernel: [668459.542394] ffff881017c57fd8 ffff881017c57fd8 ffff881017c57fd8 00000000000137c0 > Dec 8 03:37:32 node2 kernel: [668459.565602] ffff8810197b4500 ffff88100a612e00 ffff881017c57a90 ffff88207fcb4080 > Dec 8 03:37:32 node2 kernel: [668459.588736] Call Trace: > > Is this a known problem and fixed in DRBD 8.4? Probably? I think I remember something about fixing state handling getting stuck in "Timeout". Lars -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed