[DRBD-user] DRBD blocked for more than 120 seconds on CentOS 6.0 (FAIL)

Igmar Palsenberg drbd at palsenberg.com
Tue Nov 1 08:00:32 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> lock up and backtrace::::::::::::::::::::::::::::::::::::::::
> 
> INFO: task drbd_w_r1:4599 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> drbd_w_r1     D ffff880818639900     0  4599      2 0x00000080
> ffff88080dd93c70 0000000000000046 ffff88081a9b8af8 ffff8800282569f0
> ffff88080dd93bf0 ffffffff81056720 ffff88080dd93c40 000000000000056c
> ffff88081a9b8638 ffff88080dd93fd8 0000000000010518 ffff88081a9b8638
> Call Trace:
> [<ffffffff81056720>] ? __dequeue_entity+0x30/0x50
> [<ffffffff8109218e>] ? prepare_to_wait+0x4e/0x80
> [<ffffffffa02098e5>] bm_page_io_async+0xe5/0x370 [drbd]
> [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40
> [<ffffffffa020b8c2>] bm_rw+0x1a2/0x680 [drbd]
> [<ffffffffa0202056>] ? crc32c+0x56/0x7c [libcrc32c]
> [<ffffffffa020bdba>] drbd_bm_write_hinted+0x1a/0x20 [drbd]
> [<ffffffffa0224602>] _al_write_transaction+0x2c2/0x6a0 [drbd]
> [<ffffffffa0224d42>] w_al_write_transaction+0x22/0x50 [drbd]
> [<ffffffffa020e85e>] drbd_worker+0x10e/0x480 [drbd]
> [<ffffffffa022aa19>] drbd_thread_setup+0xa9/0x160 [drbd]
> [<ffffffff810141ca>] child_rip+0xa/0x20
> [<ffffffffa022a970>] ? drbd_thread_setup+0x0/0x160 [drbd]
> [<ffffffff810141c0>] ? child_rip+0x0/0x20

It's waiting for disk IO. What DRBD proto version are you using ?

> Initializing cgroup subsys cpuset
> Initializing cgroup subsys cpu
> Linux version 2.6.32-71.29.1.el6.x86_64 (mockbuild at c6b5.bsys.dev.centos.org) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #1 SMP Mon Jun 27

My advice : Test again with a vanilla kernel. RedHat kernels have tons of stuff backported, so that version number hardly reflects the actual state of the kernel. I've ran into issues with backported features before.



> d-con r1: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) 
> d-con r1: asender terminated
> d-con r1: Terminating asender thread
> block drbd0: new current UUID 36C25BAEFD88C481:49F8A559F608C6F5:09D91D5543CBB23A:09D81D5543CBB23A
> d-con r1: Connection closed
> d-con r1: conn( Disconnecting -> StandAlone ) 
> d-con r1: receiver terminated
> d-con r1: Terminating receiver thread

The syncer connection got disconnected. That's usually a bad sign. Why was the disconnect in the first place ? It might be related to the disk IO failing.

> INFO: task drbd_w_r1:4599 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> drbd_w_r1     D ffff880818639900     0  4599      2 0x00000080
> ffff88080dd93c70 0000000000000046 ffff88081a9b8af8 ffff8800282569f0
> ffff88080dd93bf0 ffffffff81056720 ffff88080dd93c40 000000000000056c
> ffff88081a9b8638 ffff88080dd93fd8 0000000000010518 ffff88081a9b8638
> Call Trace:
> [<ffffffff81056720>] ? __dequeue_entity+0x30/0x50
> [<ffffffff8109218e>] ? prepare_to_wait+0x4e/0x80
> [<ffffffffa02098e5>] bm_page_io_async+0xe5/0x370 [drbd]
> [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40
> [<ffffffffa020b8c2>] bm_rw+0x1a2/0x680 [drbd]
> [<ffffffffa0202056>] ? crc32c+0x56/0x7c [libcrc32c]
> [<ffffffffa020bdba>] drbd_bm_write_hinted+0x1a/0x20 [drbd]
> [<ffffffffa0224602>] _al_write_transaction+0x2c2/0x6a0 [drbd]
> [<ffffffffa0224d42>] w_al_write_transaction+0x22/0x50 [drbd]
> [<ffffffffa020e85e>] drbd_worker+0x10e/0x480 [drbd]
> [<ffffffffa022aa19>] drbd_thread_setup+0xa9/0x160 [drbd]
> [<ffffffff810141ca>] child_rip+0xa/0x20
> [<ffffffffa022a970>] ? drbd_thread_setup+0x0/0x160 [drbd]
> [<ffffffff810141c0>] ? child_rip+0x0/0x20

You're sure your slave can keep up with this machine ? I've seen cases where things went bad because the other side's IO subsystems where waaaaaaay slower then the masters, so it kept lagging behind, and eventually things broke.




	Igmar


More information about the drbd-user mailing list