[DRBD-user] Stalled online verification - drbd0_worker sock_sendmsg time expired

Wed Apr 20 12:48:01 CEST 2011

Hello list,

I am experiencing online verification stall on our two-node DRBD (RHCS)
cluster, running RHEL 6.0 (2.6.32-71.el6.x86_64), and DRDB 8.3.10
(api:88).

SATA disks are configured as RAID 1, 1TB, and have one LVM device as a
DRBD resource, which is formatted as ext4. The servers are connected
directly (bypassing the switch), using two GigE network cards in bonded
load balancing (round-robin) mode, on both servers.

When I run the online verification, I see a fair amount of drbd0_worker
sock_sendmsg time expired messages – my first guess is that drbd0 worker
process got  stuck while trying to send some blocks?

I've noticed this originally previous morning, and the output of
/proc/drbd was:

----------%<----------
# cat /proc/drbd
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root at node01,
2011-02-23 17:38:20
 0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:295912 nr:94208 dw:390256 dr:490915949 al:38 bm:19 lo:1 pe:61424
ua:2048 ap:6 ep:1 wo:b oos:0
        [=========>..........] verified: 51.9% (444468/923796)M
        finish: 434147:56:06 speed: 0 (124) want: 112,640 K/sec (stalled)
----------%<----------

I had to restart the servers to resolve the problem...

I have run the verification once again, but the problems are the same. I
even can't do a 'ls' on the DRBD partition (the command just hangs). So
today, when I've run the verification again, the logs on primary node
print the following:

----------%<----------
Apr 20 11:03:01 node02 kernel: block drbd0: conn( Connected -> VerifyS )
Apr 20 11:03:01 node02 kernel: block drbd0: Starting Online Verify from
sector 0
Apr 20 11:04:20 node02 kernel: block drbd0: [drbd0_worker/1851]
sock_sendmsg time expired, ko = 4294967295
Apr 20 11:04:26 node02 kernel: block drbd0: [drbd0_worker/1851]
sock_sendmsg time expired, ko = 4294967294
.
.
.
Apr 20 11:07:50 node02 kernel: INFO: task jbd2/drbd0-8:3571 blocked for
more than 120 seconds.
Apr 20 11:07:50 node02 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 11:07:50 node02 kernel: jbd2/drbd0-8  D ffff88013fc24300     0 
3571      2 0x00000080
Apr 20 11:07:50 node02 kernel: ffff88013957fd10 0000000000000046
0000000000000000 0000000000000008
Apr 20 11:07:50 node02 kernel: 0000000000016980 0000000000016980
ffff88002be10d80 00000001001f1d4f
Apr 20 11:07:50 node02 kernel: ffff880128149a98 ffff88013957ffd8
0000000000010518 ffff880128149a98
Apr 20 11:07:50 node02 kernel: Call Trace:
Apr 20 11:07:50 node02 kernel: [<ffffffffa00d88f0>]
jbd2_journal_commit_transaction+0x1c0/0x14e0 [jbd2]
Apr 20 11:07:50 node02 kernel: [<ffffffff810116e0>] ? __switch_to+0xd0/0x320
Apr 20 11:07:50 node02 kernel: [<ffffffff8107d5ac>] ?
lock_timer_base+0x3c/0x70
Apr 20 11:07:50 node02 kernel: [<ffffffff81091ca0>] ?
autoremove_wake_function+0x0/0x40
Apr 20 11:07:50 node02 kernel: [<ffffffffa00df0b8>] kjournald2+0xb8/0x220
[jbd2]
Apr 20 11:07:50 node02 kernel: [<ffffffff81091ca0>] ?
autoremove_wake_function+0x0/0x40
Apr 20 11:07:50 node02 kernel: [<ffffffffa00df000>] ? kjournald2+0x0/0x220
[jbd2]
Apr 20 11:07:50 node02 kernel: [<ffffffff81091936>] kthread+0x96/0xa0
Apr 20 11:07:50 node02 kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Apr 20 11:07:50 node02 kernel: [<ffffffff810918a0>] ? kthread+0x0/0xa0
Apr 20 11:07:50 node02 kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
Apr 20 11:07:50 node02 kernel: INFO: task touch:4825 blocked for more than
120 seconds.
Apr 20 11:07:50 node02 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 11:07:50 node02 kernel: touch         D ffffffff8119dcf0     0 
4825   4770 0x00000080
Apr 20 11:07:50 node02 kernel: ffff88012623fa08 0000000000000086
ffff88012623f988 00000000a0192deb
Apr 20 11:07:50 node02 kernel: ffff880136b91400 ffff8801362e94e0
ffff8801369b1c00 0000000000000002
Apr 20 11:07:50 node02 kernel: ffff880138a045f8 ffff88012623ffd8
0000000000010518 ffff880138a045f8
Apr 20 11:07:50 node02 kernel: Call Trace:
Apr 20 11:07:50 node02 kernel: [<ffffffff8109b9a9>] ? ktime_get_ts+0xa9/0xe0
Apr 20 11:07:50 node02 kernel: [<ffffffff8119dcf0>] ? sync_buffer+0x0/0x50
Apr 20 11:07:50 node02 kernel: [<ffffffff814c8a23>] io_schedule+0x73/0xc0
Apr 20 11:07:50 node02 kernel: [<ffffffff8119dd30>] sync_buffer+0x40/0x50
Apr 20 11:07:50 node02 kernel: [<ffffffff814c914a>]
__wait_on_bit_lock+0x5a/0xc0
Apr 20 11:07:50 node02 kernel: [<ffffffff8119dcf0>] ? sync_buffer+0x0/0x50
Apr 20 11:07:50 node02 kernel: [<ffffffff814c9228>]
out_of_line_wait_on_bit_lock+0x78/0x90
Apr 20 11:07:50 node02 kernel: [<ffffffff81091ce0>] ?
wake_bit_function+0x0/0x50
Apr 20 11:07:50 node02 kernel: [<ffffffff8119ded6>] __lock_buffer+0x36/0x40
Apr 20 11:07:50 node02 kernel: [<ffffffffa00d82b3>]
do_get_write_access+0x473/0x500 [jbd2]
Apr 20 11:07:50 node02 kernel: [<ffffffff8119d3fc>] ? __getblk+0x2c/0x2e0
Apr 20 11:07:50 node02 kernel: [<ffffffff81200840>] ?
security_inode_alloc+0x40/0x60
Apr 20 11:07:50 node02 kernel: [<ffffffffa00d8491>]
jbd2_journal_get_write_access+0x31/0x50 [jbd2]
Apr 20 11:07:50 node02 kernel: [<ffffffffa012bd48>]
__ext4_journal_get_write_access+0x38/0x80 [ext4]
Apr 20 11:07:50 node02 kernel: [<ffffffffa01036c2>]
ext4_new_inode+0x3a2/0x10e0 [ext4]
Apr 20 11:07:50 node02 kernel: [<ffffffffa00d74e5>] ?
jbd2_journal_start+0xb5/0x100 [jbd2]
Apr 20 11:07:50 node02 kernel: [<ffffffffa0111a60>] ext4_create+0xc0/0x150
[ext4]
Apr 20 11:07:50 node02 kernel: [<ffffffff81179c24>] vfs_create+0xb4/0xe0
Apr 20 11:07:50 node02 kernel: [<ffffffff8117d766>] do_filp_open+0xa46/0xd40
Apr 20 11:07:50 node02 kernel: [<ffffffff81135837>] ?
handle_pte_fault+0xf7/0xad0
Apr 20 11:07:50 node02 kernel: [<ffffffff811892f2>] ? alloc_fd+0x92/0x160
Apr 20 11:07:50 node02 kernel: [<ffffffff81169929>] do_sys_open+0x69/0x140
Apr 20 11:07:50 node02 kernel: [<ffffffff81169a40>] sys_open+0x20/0x30
Apr 20 11:07:50 node02 kernel: [<ffffffff81013172>]
system_call_fastpath+0x16/0x1b
Apr 20 11:07:50 node02 kernel: block drbd0: [drbd0_worker/1851]
sock_sendmsg time expired, ko = 4294967260
Apr 20 11:07:56 node02 kernel: block drbd0: [drbd0_worker/1851]
sock_sendmsg time expired, ko = 4294967259
Apr 20 11:08:02 node02 kernel: block drbd0: [drbd0_worker/1851]
sock_sendmsg time expired, ko = 4294967258
Apr 20 11:08:08 node02 kernel: block drbd0: [drbd0_worker/1851]
sock_sendmsg time expired, ko = 4294967257
Apr 20 11:08:14 node02 kernel: block drbd0: [drbd0_worker/1851]
sock_sendmsg time expired, ko = 4294967256
.
.
.
----------%<----------

The logs on secondary print the same "sock_sendmsg time expired" messages,
but without the blocked tasks messages for jbd2 and touch...

/proc/drbd on primary node:

----------%<----------
# cat /proc/drbd
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by root at node02,
2011-02-23 17:37:52
 0: cs:VerifyS ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:2120 nr:0 dw:2140 dr:6689633 al:7 bm:0 lo:1 pe:9356 ua:2048 ap:5
ep:1 wo:b oos:0
        [>....................] verified:  0.8% (917276/923796)M
        finish: 642:35:47 speed: 404 (1,464) want: 112,640 K/sec
----------%<----------

DRDB configuration:

----------%<----------
global {
  usage-count no;
}
common {
  protocol C;
}
resource shared_partition {
        handlers {
          split-brain "/lib/drbd/notify-split-brain.sh root";
        }
        syncer {
          rate 110M;
          verify-alg sha1;
        }
  on node01. {
    device    /dev/drbd0;
    disk      /dev/vg_node01/lv_shared;
    address   10.2.30.141:7789;
    meta-disk internal;
  }
  on node02. {
    device    /dev/drbd0;
    disk      /dev/vg_node02/lv_shared;
    address   10.2.30.142:7789;
    meta-disk internal;
  }
}
----------%<----------

Processes on primary:

----------%<----------
# ps -eo pid,args,state,wchan | grep drbd
 1851 [drbd0_worker]              S sk_stream_wait_memory
 1859 [drbd0_receiver]            S drbd_pp_alloc
 1878 [drbd0_asender]             S sk_wait_data
 3571 [jbd2/drbd0-8]              D jbd2_journal_commit_transaction
 5872 grep drbd                   R -
----------%<----------

Processes on secondary:

----------%<----------
# ps -e -o pid,args,stat,wchan | grep drbd
 1862 [drbd0_worker]              S    sk_stream_wait_memory
 1870 [drbd0_receiver]            S    drbd_pp_alloc
 1889 [drbd0_asender]             S    sk_wait_data
 5159 grep drbd                   S+   pipe_wait
----------%<----------

I have also tested the network throughput via bonded interface (which DRBD
uses for synchronization and verification), and it is around 40MBps.

Anyone has an idea what is behind this problem?