[DRBD-user] IO Error Logging

Andrew Eross eross at locatrix.com
Sun Oct 7 17:13:12 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


There we go -- just hadn't noticed before, but CentOS5 (this is a XenServer
actually) was writing out the same stuff to /var/log/kern.log with
timestamps.

I reproduced this again just to make sure I have my story straight.

I have vh03 and vh04, with vh04 being the primary right now.

*First I make vh04 secondary:*
[root at vh03 ~]# date
Mon Oct  8 00:47:30 EST 2012
[root at vh04 ~]# drbdadm secondary drbd-sr1
[root at vh04 ~]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by
root at xenddk.eightmile.locatrix.net, 2012-09-20 14:40:39
 1: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
    ns:551471038 nr:102855 dw:339447098 dr:893008359 al:52675 bm:35090 lo:0
pe:0 ua:0 ap:0 ep:1 wo:d oos:0

*Kernel log on vh04 shows:*
Oct  8 00:47:37 vh04 kernel: block drbd1: role( Primary -> Secondary )
Oct  8 00:47:37 vh04 kernel: block drbd1: bitmap WRITE of 0 pages took 0
jiffies
Oct  8 00:47:37 vh04 kernel: block drbd1: 0 KB (0 bits) marked out-of-sync
by on disk bit-map.

*No issues, that worked fine, returns immediately as expected.*

*Now I make vh03 the primary:*
[root at vh03 ~]# date
Mon Oct  8 00:48:00 EST 2012
[root at vh03 ~]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by
root at xenddk.eightmile.locatrix.net, 2012-09-20 14:40:39
 1: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
    ns:102855 nr:530600089 dw:530702944 dr:288517 al:196 bm:25600 lo:0 pe:0
ua:0 ap:0 ep:1 wo:f oos:0
[root at vh03 ~]# drbdadm primary drbd-sr1

*Right away vh03 freezes up. So I'm no longer getting any console
output...but 15 minutes later it comes alive again...this time SSH didn't
even drop, it just started working again after 15 minutes, and this is what
I see:*
Command 'drbdsetup primary 1' did not terminate within 121 seconds
[root at vh03 ~]# date
Mon Oct  8 01:04:19 EST 2012

*Now that it's alive again, here's what the kernel log had to say during
that time:*
Oct  8 00:51:39 vh03 kernel: INFO: task ovs-vswitchd:5724 blocked for more
than 120 seconds.
Oct  8 00:51:39 vh03 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  8 00:51:39 vh03 kernel: ovs-vswitchd  D ea841ce8     0  5724   5723
0x00000004
Oct  8 00:51:39 vh03 kernel:  ea841cfc 00200286 00000002 ea841ce8 c16eb224
00000000 00200200 ee404050
Oct  8 00:51:39 vh03 kernel:  ea8a2140 00000019 eddd99d4 eddd98c4 eddd9830
eddd99d4 c16ec100 00000003
Oct  8 00:51:39 vh03 kernel:  eb7e1900 00d8bdf8 00000000 c022a07e ea841f08
00000001 00000001 ea841cec
Oct  8 00:51:39 vh03 kernel: Call Trace:
Oct  8 00:51:39 vh03 kernel:  [<c022a07e>] ? copy_to_user+0x3e/0x60
Oct  8 00:51:39 vh03 kernel:  [<c0383c2b>] __mutex_lock_slowpath+0x10b/0x160
Oct  8 00:51:39 vh03 kernel:  [<c0383a80>] mutex_lock+0x20/0x40
Oct  8 00:51:39 vh03 kernel:  [<c03229a0>] genl_rcv+0x10/0x30
Oct  8 00:51:39 vh03 kernel:  [<c0320d5a>] netlink_unicast+0x24a/0x270
Oct  8 00:51:39 vh03 kernel:  [<c0301e21>] ? memcpy_fromiovec+0x41/0x60
Oct  8 00:51:39 vh03 kernel:  [<c03219de>] netlink_sendmsg+0x1ce/0x290
Oct  8 00:51:39 vh03 kernel:  [<c02f88ea>] sock_sendmsg+0xfa/0x110
Oct  8 00:51:39 vh03 kernel:  [<c01458a0>] ? enqueue_hrtimer+0x60/0x80
Oct  8 00:51:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:51:39 vh03 kernel:  [<c02f7ac0>] ? sock_destroy_inode+0x10/0x20
Oct  8 00:51:39 vh03 kernel:  [<c0229eac>] ? copy_from_user+0x3c/0x70
Oct  8 00:51:39 vh03 kernel:  [<c02f9493>] ? sys_recvmsg+0xf3/0x1c0
Oct  8 00:51:39 vh03 kernel:  [<c02f80a0>] ? sockfd_lookup_light+0x30/0x60
Oct  8 00:51:39 vh03 kernel:  [<c02f8bc5>] sys_sendto+0xa5/0xd0
Oct  8 00:51:39 vh03 kernel:  [<c0194625>] ? put_unused_fd+0x35/0x70
Oct  8 00:51:39 vh03 kernel:  [<c02f9973>] ? sys_accept4+0x1b3/0x1e0
Oct  8 00:51:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:51:39 vh03 kernel:  [<c02f8c26>] sys_send+0x36/0x40
Oct  8 00:51:39 vh03 kernel:  [<c02f9ae6>] sys_socketcall+0x146/0x270
Oct  8 00:51:39 vh03 kernel:  [<c0104501>] syscall_call+0x7/0xb
Oct  8 00:51:39 vh03 kernel: INFO: task drbdsetup:18163 blocked for more
than 120 seconds.
Oct  8 00:51:39 vh03 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  8 00:51:39 vh03 kernel: drbdsetup     D 00000002     0 18163      1
0x00000000
Oct  8 00:51:39 vh03 kernel:  e9ecdc68 00000286 00000001 00000002 ed014410
e9ecdc04 e9ecdc14 f0c9ac78
Oct  8 00:51:39 vh03 kernel:  8c8c6ebd 00008176 ed817214 ed817104 ed817070
ed817214 c16ec100 00000003
Oct  8 00:51:39 vh03 kernel:  edc08200 f0c8b32b 00000000 00003104 00000000
00000005 002689a5 ed014400
Oct  8 00:51:39 vh03 kernel: Call Trace:
Oct  8 00:51:39 vh03 kernel:  [<f0c9ac78>] ? conn_highest_peer+0x48/0x60
[drbd]
Oct  8 00:51:39 vh03 kernel:  [<f0c8b32b>] ? drbd_send_command+0x3b/0x50
[drbd]
Oct  8 00:51:39 vh03 kernel:  [<f0c9bfe8>] ? _req_st_cond+0xe8/0x130 [drbd]
Oct  8 00:51:39 vh03 kernel:  [<f0c9e846>] drbd_req_state+0x146/0x300 [drbd]
Oct  8 00:51:39 vh03 kernel:  [<c0139d23>] ? signal_wake_up+0x23/0x40
Oct  8 00:51:39 vh03 kernel:  [<c0139e19>] ? complete_signal+0xd9/0x1b0
Oct  8 00:51:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:51:39 vh03 kernel:  [<f0c9ea1e>] _drbd_request_state+0x1e/0xa0
[drbd]
Oct  8 00:51:39 vh03 kernel:  [<c013ac4b>] ? force_sig_info+0x9b/0xb0
Oct  8 00:51:39 vh03 kernel:  [<f0c96303>] drbd_set_role+0x53/0x550 [drbd]
Oct  8 00:51:39 vh03 kernel:  [<f0c93f45>] ? drbd_adm_prepare+0x255/0x430
[drbd]
Oct  8 00:51:39 vh03 kernel:  [<f0c96bed>] drbd_adm_set_role+0x9d/0xc0
[drbd]
Oct  8 00:51:39 vh03 kernel:  [<c0323c68>] genl_rcv_msg+0x158/0x190
Oct  8 00:51:39 vh03 kernel:  [<c0323b10>] ? genl_rcv_msg+0x0/0x190
Oct  8 00:51:39 vh03 kernel:  [<c0320fcd>] netlink_rcv_skb+0x7d/0xa0
Oct  8 00:51:39 vh03 kernel:  [<c03229ac>] genl_rcv+0x1c/0x30
Oct  8 00:51:39 vh03 kernel:  [<c0320d5a>] netlink_unicast+0x24a/0x270
Oct  8 00:51:39 vh03 kernel:  [<c0301e21>] ? memcpy_fromiovec+0x41/0x60
Oct  8 00:51:39 vh03 kernel:  [<c03219de>] netlink_sendmsg+0x1ce/0x290
Oct  8 00:51:39 vh03 kernel:  [<c02f7f3a>] sock_aio_write+0x10a/0x120
Oct  8 00:51:39 vh03 kernel:  [<c019631d>] do_sync_write+0xcd/0x110
Oct  8 00:51:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:51:39 vh03 kernel:  [<c0201aaf>] ?
security_file_permission+0xf/0x20
Oct  8 00:51:39 vh03 kernel:  [<c01964d0>] ? rw_verify_area+0x60/0xc0
Oct  8 00:51:39 vh03 kernel:  [<c0196b56>] vfs_write+0x146/0x150
Oct  8 00:51:39 vh03 kernel:  [<c01970fd>] sys_write+0x3d/0x70
Oct  8 00:51:39 vh03 kernel:  [<c0104501>] syscall_call+0x7/0xb
Oct  8 00:53:39 vh03 kernel: INFO: task ovs-vswitchd:5724 blocked for more
than 120 seconds.
Oct  8 00:53:39 vh03 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  8 00:53:39 vh03 kernel: ovs-vswitchd  D ea841ce8     0  5724   5723
0x00000004
Oct  8 00:53:39 vh03 kernel:  ea841cfc 00200286 00000002 ea841ce8 c16eb224
00000000 00200200 ee404050
Oct  8 00:53:39 vh03 kernel:  ea8a2140 00000019 eddd99d4 eddd98c4 eddd9830
eddd99d4 c16ec100 00000003
Oct  8 00:53:39 vh03 kernel:  eb7e1900 00d8bdf8 00000000 c022a07e ea841f08
00000001 00000001 ea841cec
Oct  8 00:53:39 vh03 kernel: Call Trace:
Oct  8 00:53:39 vh03 kernel:  [<c022a07e>] ? copy_to_user+0x3e/0x60
Oct  8 00:53:39 vh03 kernel:  [<c0383c2b>] __mutex_lock_slowpath+0x10b/0x160
Oct  8 00:53:39 vh03 kernel:  [<c0383a80>] mutex_lock+0x20/0x40
Oct  8 00:53:39 vh03 kernel:  [<c03229a0>] genl_rcv+0x10/0x30
Oct  8 00:53:39 vh03 kernel:  [<c0320d5a>] netlink_unicast+0x24a/0x270
Oct  8 00:53:39 vh03 kernel:  [<c0301e21>] ? memcpy_fromiovec+0x41/0x60
Oct  8 00:53:39 vh03 kernel:  [<c03219de>] netlink_sendmsg+0x1ce/0x290
Oct  8 00:53:39 vh03 kernel:  [<c02f88ea>] sock_sendmsg+0xfa/0x110
Oct  8 00:53:39 vh03 kernel:  [<c01458a0>] ? enqueue_hrtimer+0x60/0x80
Oct  8 00:53:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:53:39 vh03 kernel:  [<c02f7ac0>] ? sock_destroy_inode+0x10/0x20
Oct  8 00:53:39 vh03 kernel:  [<c0229eac>] ? copy_from_user+0x3c/0x70
Oct  8 00:53:39 vh03 kernel:  [<c02f9493>] ? sys_recvmsg+0xf3/0x1c0
Oct  8 00:53:39 vh03 kernel:  [<c02f80a0>] ? sockfd_lookup_light+0x30/0x60
Oct  8 00:53:39 vh03 kernel:  [<c02f8bc5>] sys_sendto+0xa5/0xd0
Oct  8 00:53:39 vh03 kernel:  [<c0194625>] ? put_unused_fd+0x35/0x70
Oct  8 00:53:39 vh03 kernel:  [<c02f9973>] ? sys_accept4+0x1b3/0x1e0
Oct  8 00:53:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:53:39 vh03 kernel:  [<c02f8c26>] sys_send+0x36/0x40
Oct  8 00:53:39 vh03 kernel:  [<c02f9ae6>] sys_socketcall+0x146/0x270
Oct  8 00:53:39 vh03 kernel:  [<c0104501>] syscall_call+0x7/0xb
Oct  8 00:53:39 vh03 kernel: INFO: task drbdsetup:18163 blocked for more
than 120 seconds.
Oct  8 00:53:39 vh03 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  8 00:53:39 vh03 kernel: drbdsetup     D 00000002     0 18163      1
0x00000000
Oct  8 00:53:39 vh03 kernel:  e9ecdc68 00000286 00000001 00000002 ed014410
e9ecdc04 e9ecdc14 f0c9ac78
Oct  8 00:53:39 vh03 kernel:  8c8c6ebd 00008176 ed817214 ed817104 ed817070
ed817214 c16ec100 00000003
Oct  8 00:53:39 vh03 kernel:  edc08200 f0c8b32b 00000000 00003104 00000000
00000005 002689a5 ed014400
Oct  8 00:53:39 vh03 kernel: Call Trace:
Oct  8 00:53:39 vh03 kernel:  [<f0c9ac78>] ? conn_highest_peer+0x48/0x60
[drbd]
Oct  8 00:53:39 vh03 kernel:  [<f0c8b32b>] ? drbd_send_command+0x3b/0x50
[drbd]
Oct  8 00:53:39 vh03 kernel:  [<f0c9bfe8>] ? _req_st_cond+0xe8/0x130 [drbd]
Oct  8 00:53:39 vh03 kernel:  [<f0c9e846>] drbd_req_state+0x146/0x300 [drbd]
Oct  8 00:53:39 vh03 kernel:  [<c0139d23>] ? signal_wake_up+0x23/0x40
Oct  8 00:53:39 vh03 kernel:  [<c0139e19>] ? complete_signal+0xd9/0x1b0
Oct  8 00:53:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:53:39 vh03 kernel:  [<f0c9ea1e>] _drbd_request_state+0x1e/0xa0
[drbd]
Oct  8 00:53:39 vh03 kernel:  [<c013ac4b>] ? force_sig_info+0x9b/0xb0
Oct  8 00:53:39 vh03 kernel:  [<f0c96303>] drbd_set_role+0x53/0x550 [drbd]
Oct  8 00:53:39 vh03 kernel:  [<f0c93f45>] ? drbd_adm_prepare+0x255/0x430
[drbd]
Oct  8 00:53:39 vh03 kernel:  [<f0c96bed>] drbd_adm_set_role+0x9d/0xc0
[drbd]
Oct  8 00:53:39 vh03 kernel:  [<c0323c68>] genl_rcv_msg+0x158/0x190
Oct  8 00:53:39 vh03 kernel:  [<c0323b10>] ? genl_rcv_msg+0x0/0x190
Oct  8 00:53:39 vh03 kernel:  [<c0320fcd>] netlink_rcv_skb+0x7d/0xa0
Oct  8 00:53:39 vh03 kernel:  [<c03229ac>] genl_rcv+0x1c/0x30
Oct  8 00:53:39 vh03 kernel:  [<c0320d5a>] netlink_unicast+0x24a/0x270
Oct  8 00:53:39 vh03 kernel:  [<c0301e21>] ? memcpy_fromiovec+0x41/0x60
Oct  8 00:53:39 vh03 kernel:  [<c03219de>] netlink_sendmsg+0x1ce/0x290
Oct  8 00:53:39 vh03 kernel:  [<c02f7f3a>] sock_aio_write+0x10a/0x120
Oct  8 00:53:39 vh03 kernel:  [<c019631d>] do_sync_write+0xcd/0x110
Oct  8 00:53:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:53:39 vh03 kernel:  [<c0201aaf>] ?
security_file_permission+0xf/0x20
Oct  8 00:53:39 vh03 kernel:  [<c01964d0>] ? rw_verify_area+0x60/0xc0
Oct  8 00:53:39 vh03 kernel:  [<c0196b56>] vfs_write+0x146/0x150
Oct  8 00:53:39 vh03 kernel:  [<c01970fd>] sys_write+0x3d/0x70
Oct  8 00:53:39 vh03 kernel:  [<c0104501>] syscall_call+0x7/0xb
Oct  8 00:55:39 vh03 kernel: INFO: task ovs-vswitchd:5724 blocked for more
than 120 seconds.
Oct  8 00:55:39 vh03 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  8 00:55:39 vh03 kernel: ovs-vswitchd  D ea841ce8     0  5724   5723
0x00000004
Oct  8 00:55:39 vh03 kernel:  ea841cfc 00200286 00000002 ea841ce8 c16eb224
00000000 00200200 ee404050
Oct  8 00:55:39 vh03 kernel:  ea8a2140 00000019 eddd99d4 eddd98c4 eddd9830
eddd99d4 c16ec100 00000003
Oct  8 00:55:39 vh03 kernel:  eb7e1900 00d8bdf8 00000000 c022a07e ea841f08
00000001 00000001 ea841cec
Oct  8 00:55:39 vh03 kernel: Call Trace:
Oct  8 00:55:39 vh03 kernel:  [<c022a07e>] ? copy_to_user+0x3e/0x60
Oct  8 00:55:39 vh03 kernel:  [<c0383c2b>] __mutex_lock_slowpath+0x10b/0x160
Oct  8 00:55:39 vh03 kernel:  [<c0383a80>] mutex_lock+0x20/0x40
Oct  8 00:55:39 vh03 kernel:  [<c03229a0>] genl_rcv+0x10/0x30
Oct  8 00:55:39 vh03 kernel:  [<c0320d5a>] netlink_unicast+0x24a/0x270
Oct  8 00:55:39 vh03 kernel:  [<c0301e21>] ? memcpy_fromiovec+0x41/0x60
Oct  8 00:55:39 vh03 kernel:  [<c03219de>] netlink_sendmsg+0x1ce/0x290
Oct  8 00:55:39 vh03 kernel:  [<c02f88ea>] sock_sendmsg+0xfa/0x110
Oct  8 00:55:39 vh03 kernel:  [<c01458a0>] ? enqueue_hrtimer+0x60/0x80
Oct  8 00:55:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:55:39 vh03 kernel:  [<c02f7ac0>] ? sock_destroy_inode+0x10/0x20
Oct  8 00:55:39 vh03 kernel:  [<c0229eac>] ? copy_from_user+0x3c/0x70
Oct  8 00:55:39 vh03 kernel:  [<c02f9493>] ? sys_recvmsg+0xf3/0x1c0
Oct  8 00:55:39 vh03 kernel:  [<c02f80a0>] ? sockfd_lookup_light+0x30/0x60
Oct  8 00:55:39 vh03 kernel:  [<c02f8bc5>] sys_sendto+0xa5/0xd0
Oct  8 00:55:39 vh03 kernel:  [<c0194625>] ? put_unused_fd+0x35/0x70
Oct  8 00:55:39 vh03 kernel:  [<c02f9973>] ? sys_accept4+0x1b3/0x1e0
Oct  8 00:55:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:55:39 vh03 kernel:  [<c02f8c26>] sys_send+0x36/0x40
Oct  8 00:55:39 vh03 kernel:  [<c02f9ae6>] sys_socketcall+0x146/0x270
Oct  8 00:55:39 vh03 kernel:  [<c0104501>] syscall_call+0x7/0xb
Oct  8 00:55:39 vh03 kernel: INFO: task drbdsetup:18163 blocked for more
than 120 seconds.
Oct  8 00:55:39 vh03 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  8 00:55:39 vh03 kernel: drbdsetup     D 00000002     0 18163      1
0x00000000
Oct  8 00:55:39 vh03 kernel:  e9ecdc68 00000286 00000001 00000002 ed014410
e9ecdc04 e9ecdc14 f0c9ac78
Oct  8 00:55:39 vh03 kernel:  8c8c6ebd 00008176 ed817214 ed817104 ed817070
ed817214 c16ec100 00000003
Oct  8 00:55:39 vh03 kernel:  edc08200 f0c8b32b 00000000 00003104 00000000
00000005 002689a5 ed014400
Oct  8 00:55:39 vh03 kernel: Call Trace:
Oct  8 00:55:39 vh03 kernel:  [<f0c9ac78>] ? conn_highest_peer+0x48/0x60
[drbd]
Oct  8 00:55:39 vh03 kernel:  [<f0c8b32b>] ? drbd_send_command+0x3b/0x50
[drbd]
Oct  8 00:55:39 vh03 kernel:  [<f0c9bfe8>] ? _req_st_cond+0xe8/0x130 [drbd]
Oct  8 00:55:39 vh03 kernel:  [<f0c9e846>] drbd_req_state+0x146/0x300 [drbd]
Oct  8 00:55:39 vh03 kernel:  [<c0139d23>] ? signal_wake_up+0x23/0x40
Oct  8 00:55:39 vh03 kernel:  [<c0139e19>] ? complete_signal+0xd9/0x1b0
Oct  8 00:55:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:55:39 vh03 kernel:  [<f0c9ea1e>] _drbd_request_state+0x1e/0xa0
[drbd]
Oct  8 00:55:39 vh03 kernel:  [<c013ac4b>] ? force_sig_info+0x9b/0xb0
Oct  8 00:55:39 vh03 kernel:  [<f0c96303>] drbd_set_role+0x53/0x550 [drbd]
Oct  8 00:55:39 vh03 kernel:  [<f0c93f45>] ? drbd_adm_prepare+0x255/0x430
[drbd]
Oct  8 00:55:39 vh03 kernel:  [<f0c96bed>] drbd_adm_set_role+0x9d/0xc0
[drbd]
Oct  8 00:55:39 vh03 kernel:  [<c0323c68>] genl_rcv_msg+0x158/0x190
Oct  8 00:55:39 vh03 kernel:  [<c0323b10>] ? genl_rcv_msg+0x0/0x190
Oct  8 00:55:39 vh03 kernel:  [<c0320fcd>] netlink_rcv_skb+0x7d/0xa0
Oct  8 00:55:39 vh03 kernel:  [<c03229ac>] genl_rcv+0x1c/0x30
Oct  8 00:55:39 vh03 kernel:  [<c0320d5a>] netlink_unicast+0x24a/0x270
Oct  8 00:55:39 vh03 kernel:  [<c0301e21>] ? memcpy_fromiovec+0x41/0x60
Oct  8 00:55:39 vh03 kernel:  [<c03219de>] netlink_sendmsg+0x1ce/0x290
Oct  8 00:55:39 vh03 kernel:  [<c02f7f3a>] sock_aio_write+0x10a/0x120
Oct  8 00:55:39 vh03 kernel:  [<c019631d>] do_sync_write+0xcd/0x110
Oct  8 00:55:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:55:39 vh03 kernel:  [<c0201aaf>] ?
security_file_permission+0xf/0x20
Oct  8 00:55:39 vh03 kernel:  [<c01964d0>] ? rw_verify_area+0x60/0xc0
Oct  8 00:55:39 vh03 kernel:  [<c0196b56>] vfs_write+0x146/0x150
Oct  8 00:55:39 vh03 kernel:  [<c01970fd>] sys_write+0x3d/0x70
Oct  8 00:55:39 vh03 kernel:  [<c0104501>] syscall_call+0x7/0xb
Oct  8 00:57:39 vh03 kernel: INFO: task ovs-vswitchd:5724 blocked for more
than 120 seconds.
Oct  8 00:57:39 vh03 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  8 00:57:39 vh03 kernel: ovs-vswitchd  D ea841ce8     0  5724   5723
0x00000004
Oct  8 00:57:39 vh03 kernel:  ea841cfc 00200286 00000002 ea841ce8 c16eb224
00000000 00200200 ee404050
Oct  8 00:57:39 vh03 kernel:  ea8a2140 00000019 eddd99d4 eddd98c4 eddd9830
eddd99d4 c16ec100 00000003
Oct  8 00:57:39 vh03 kernel:  eb7e1900 00d8bdf8 00000000 c022a07e ea841f08
00000001 00000001 ea841cec
Oct  8 00:57:39 vh03 kernel: Call Trace:
Oct  8 00:57:39 vh03 kernel:  [<c022a07e>] ? copy_to_user+0x3e/0x60
Oct  8 00:57:39 vh03 kernel:  [<c0383c2b>] __mutex_lock_slowpath+0x10b/0x160
Oct  8 00:57:39 vh03 kernel:  [<c0383a80>] mutex_lock+0x20/0x40
Oct  8 00:57:39 vh03 kernel:  [<c03229a0>] genl_rcv+0x10/0x30
Oct  8 00:57:39 vh03 kernel:  [<c0320d5a>] netlink_unicast+0x24a/0x270
Oct  8 00:57:39 vh03 kernel:  [<c0301e21>] ? memcpy_fromiovec+0x41/0x60
Oct  8 00:57:39 vh03 kernel:  [<c03219de>] netlink_sendmsg+0x1ce/0x290
Oct  8 00:57:39 vh03 kernel:  [<c02f88ea>] sock_sendmsg+0xfa/0x110
Oct  8 00:57:39 vh03 kernel:  [<c01458a0>] ? enqueue_hrtimer+0x60/0x80
Oct  8 00:57:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:57:39 vh03 kernel:  [<c02f7ac0>] ? sock_destroy_inode+0x10/0x20
Oct  8 00:57:39 vh03 kernel:  [<c0229eac>] ? copy_from_user+0x3c/0x70
Oct  8 00:57:39 vh03 kernel:  [<c02f9493>] ? sys_recvmsg+0xf3/0x1c0
Oct  8 00:57:39 vh03 kernel:  [<c02f80a0>] ? sockfd_lookup_light+0x30/0x60
Oct  8 00:57:39 vh03 kernel:  [<c02f8bc5>] sys_sendto+0xa5/0xd0
Oct  8 00:57:39 vh03 kernel:  [<c0194625>] ? put_unused_fd+0x35/0x70
Oct  8 00:57:39 vh03 kernel:  [<c02f9973>] ? sys_accept4+0x1b3/0x1e0
Oct  8 00:57:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:57:39 vh03 kernel:  [<c02f8c26>] sys_send+0x36/0x40
Oct  8 00:57:39 vh03 kernel:  [<c02f9ae6>] sys_socketcall+0x146/0x270
Oct  8 00:57:39 vh03 kernel:  [<c0104501>] syscall_call+0x7/0xb
Oct  8 00:57:39 vh03 kernel: INFO: task drbdsetup:18163 blocked for more
than 120 seconds.
Oct  8 00:57:39 vh03 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  8 00:57:39 vh03 kernel: drbdsetup     D 00000002     0 18163      1
0x00000000
Oct  8 00:57:39 vh03 kernel:  e9ecdc68 00000286 00000001 00000002 ed014410
e9ecdc04 e9ecdc14 f0c9ac78
Oct  8 00:57:39 vh03 kernel:  8c8c6ebd 00008176 ed817214 ed817104 ed817070
ed817214 c16ec100 00000003
Oct  8 00:57:39 vh03 kernel:  edc08200 f0c8b32b 00000000 00003104 00000000
00000005 002689a5 ed014400
Oct  8 00:57:39 vh03 kernel: Call Trace:
Oct  8 00:57:39 vh03 kernel:  [<f0c9ac78>] ? conn_highest_peer+0x48/0x60
[drbd]
Oct  8 00:57:39 vh03 kernel:  [<f0c8b32b>] ? drbd_send_command+0x3b/0x50
[drbd]
Oct  8 00:57:39 vh03 kernel:  [<f0c9bfe8>] ? _req_st_cond+0xe8/0x130 [drbd]
Oct  8 00:57:39 vh03 kernel:  [<f0c9e846>] drbd_req_state+0x146/0x300 [drbd]
Oct  8 00:57:39 vh03 kernel:  [<c0139d23>] ? signal_wake_up+0x23/0x40
Oct  8 00:57:39 vh03 kernel:  [<c0139e19>] ? complete_signal+0xd9/0x1b0
Oct  8 00:57:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:57:39 vh03 kernel:  [<f0c9ea1e>] _drbd_request_state+0x1e/0xa0
[drbd]
Oct  8 00:57:39 vh03 kernel:  [<c013ac4b>] ? force_sig_info+0x9b/0xb0
Oct  8 00:57:39 vh03 kernel:  [<f0c96303>] drbd_set_role+0x53/0x550 [drbd]
Oct  8 00:57:39 vh03 kernel:  [<f0c93f45>] ? drbd_adm_prepare+0x255/0x430
[drbd]
Oct  8 00:57:39 vh03 kernel:  [<f0c96bed>] drbd_adm_set_role+0x9d/0xc0
[drbd]
Oct  8 00:57:39 vh03 kernel:  [<c0323c68>] genl_rcv_msg+0x158/0x190
Oct  8 00:57:39 vh03 kernel:  [<c0323b10>] ? genl_rcv_msg+0x0/0x190
Oct  8 00:57:39 vh03 kernel:  [<c0320fcd>] netlink_rcv_skb+0x7d/0xa0
Oct  8 00:57:39 vh03 kernel:  [<c03229ac>] genl_rcv+0x1c/0x30
Oct  8 00:57:39 vh03 kernel:  [<c0320d5a>] netlink_unicast+0x24a/0x270
Oct  8 00:57:39 vh03 kernel:  [<c0301e21>] ? memcpy_fromiovec+0x41/0x60
Oct  8 00:57:39 vh03 kernel:  [<c03219de>] netlink_sendmsg+0x1ce/0x290
Oct  8 00:57:39 vh03 kernel:  [<c02f7f3a>] sock_aio_write+0x10a/0x120
Oct  8 00:57:39 vh03 kernel:  [<c019631d>] do_sync_write+0xcd/0x110
Oct  8 00:57:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:57:39 vh03 kernel:  [<c0201aaf>] ?
security_file_permission+0xf/0x20
Oct  8 00:57:39 vh03 kernel:  [<c01964d0>] ? rw_verify_area+0x60/0xc0
Oct  8 00:57:39 vh03 kernel:  [<c0196b56>] vfs_write+0x146/0x150
Oct  8 00:57:39 vh03 kernel:  [<c01970fd>] sys_write+0x3d/0x70
Oct  8 00:57:39 vh03 kernel:  [<c0104501>] syscall_call+0x7/0xb
Oct  8 00:59:39 vh03 kernel: INFO: task ovs-vswitchd:5724 blocked for more
than 120 seconds.
Oct  8 00:59:39 vh03 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  8 00:59:39 vh03 kernel: ovs-vswitchd  D ea841ce8     0  5724   5723
0x00000004
Oct  8 00:59:39 vh03 kernel:  ea841cfc 00200286 00000002 ea841ce8 c16eb224
00000000 00200200 ee404050
Oct  8 00:59:39 vh03 kernel:  ea8a2140 00000019 eddd99d4 eddd98c4 eddd9830
eddd99d4 c16ec100 00000003
Oct  8 00:59:39 vh03 kernel:  eb7e1900 00d8bdf8 00000000 c022a07e ea841f08
00000001 00000001 ea841cec
Oct  8 00:59:39 vh03 kernel: Call Trace:
Oct  8 00:59:39 vh03 kernel:  [<c022a07e>] ? copy_to_user+0x3e/0x60
Oct  8 00:59:39 vh03 kernel:  [<c0383c2b>] __mutex_lock_slowpath+0x10b/0x160
Oct  8 00:59:39 vh03 kernel:  [<c0383a80>] mutex_lock+0x20/0x40
Oct  8 00:59:39 vh03 kernel:  [<c03229a0>] genl_rcv+0x10/0x30
Oct  8 00:59:39 vh03 kernel:  [<c0320d5a>] netlink_unicast+0x24a/0x270
Oct  8 00:59:39 vh03 kernel:  [<c0301e21>] ? memcpy_fromiovec+0x41/0x60
Oct  8 00:59:39 vh03 kernel:  [<c03219de>] netlink_sendmsg+0x1ce/0x290
Oct  8 00:59:39 vh03 kernel:  [<c02f88ea>] sock_sendmsg+0xfa/0x110
Oct  8 00:59:39 vh03 kernel:  [<c01458a0>] ? enqueue_hrtimer+0x60/0x80
Oct  8 00:59:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:59:39 vh03 kernel:  [<c02f7ac0>] ? sock_destroy_inode+0x10/0x20
Oct  8 00:59:39 vh03 kernel:  [<c0229eac>] ? copy_from_user+0x3c/0x70
Oct  8 00:59:39 vh03 kernel:  [<c02f9493>] ? sys_recvmsg+0xf3/0x1c0
Oct  8 00:59:39 vh03 kernel:  [<c02f80a0>] ? sockfd_lookup_light+0x30/0x60
Oct  8 00:59:39 vh03 kernel:  [<c02f8bc5>] sys_sendto+0xa5/0xd0
Oct  8 00:59:39 vh03 kernel:  [<c0194625>] ? put_unused_fd+0x35/0x70
Oct  8 00:59:39 vh03 kernel:  [<c02f9973>] ? sys_accept4+0x1b3/0x1e0
Oct  8 00:59:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:59:39 vh03 kernel:  [<c02f8c26>] sys_send+0x36/0x40
Oct  8 00:59:39 vh03 kernel:  [<c02f9ae6>] sys_socketcall+0x146/0x270
Oct  8 00:59:39 vh03 kernel:  [<c0104501>] syscall_call+0x7/0xb
Oct  8 00:59:39 vh03 kernel: INFO: task drbdsetup:18163 blocked for more
than 120 seconds.
Oct  8 00:59:39 vh03 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  8 00:59:39 vh03 kernel: drbdsetup     D 00000002     0 18163      1
0x00000000
Oct  8 00:59:39 vh03 kernel:  e9ecdc68 00000286 00000001 00000002 ed014410
e9ecdc04 e9ecdc14 f0c9ac78
Oct  8 00:59:39 vh03 kernel:  8c8c6ebd 00008176 ed817214 ed817104 ed817070
ed817214 c16ec100 00000003
Oct  8 00:59:39 vh03 kernel:  edc08200 f0c8b32b 00000000 00003104 00000000
00000005 002689a5 ed014400
Oct  8 00:59:39 vh03 kernel: Call Trace:
Oct  8 00:59:39 vh03 kernel:  [<f0c9ac78>] ? conn_highest_peer+0x48/0x60
[drbd]
Oct  8 00:59:39 vh03 kernel:  [<f0c8b32b>] ? drbd_send_command+0x3b/0x50
[drbd]
Oct  8 00:59:39 vh03 kernel:  [<f0c9bfe8>] ? _req_st_cond+0xe8/0x130 [drbd]
Oct  8 00:59:39 vh03 kernel:  [<f0c9e846>] drbd_req_state+0x146/0x300 [drbd]
Oct  8 00:59:39 vh03 kernel:  [<c0139d23>] ? signal_wake_up+0x23/0x40
Oct  8 00:59:39 vh03 kernel:  [<c0139e19>] ? complete_signal+0xd9/0x1b0
Oct  8 00:59:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:59:39 vh03 kernel:  [<f0c9ea1e>] _drbd_request_state+0x1e/0xa0
[drbd]
Oct  8 00:59:39 vh03 kernel:  [<c013ac4b>] ? force_sig_info+0x9b/0xb0
Oct  8 00:59:39 vh03 kernel:  [<f0c96303>] drbd_set_role+0x53/0x550 [drbd]
Oct  8 00:59:39 vh03 kernel:  [<f0c93f45>] ? drbd_adm_prepare+0x255/0x430
[drbd]
Oct  8 00:59:39 vh03 kernel:  [<f0c96bed>] drbd_adm_set_role+0x9d/0xc0
[drbd]
Oct  8 00:59:39 vh03 kernel:  [<c0323c68>] genl_rcv_msg+0x158/0x190
Oct  8 00:59:39 vh03 kernel:  [<c0323b10>] ? genl_rcv_msg+0x0/0x190
Oct  8 00:59:39 vh03 kernel:  [<c0320fcd>] netlink_rcv_skb+0x7d/0xa0
Oct  8 00:59:39 vh03 kernel:  [<c03229ac>] genl_rcv+0x1c/0x30
Oct  8 00:59:39 vh03 kernel:  [<c0320d5a>] netlink_unicast+0x24a/0x270
Oct  8 00:59:39 vh03 kernel:  [<c0301e21>] ? memcpy_fromiovec+0x41/0x60
Oct  8 00:59:39 vh03 kernel:  [<c03219de>] netlink_sendmsg+0x1ce/0x290
Oct  8 00:59:39 vh03 kernel:  [<c02f7f3a>] sock_aio_write+0x10a/0x120
Oct  8 00:59:39 vh03 kernel:  [<c019631d>] do_sync_write+0xcd/0x110
Oct  8 00:59:39 vh03 kernel:  [<c0142b10>] ?
autoremove_wake_function+0x0/0x50
Oct  8 00:59:39 vh03 kernel:  [<c0201aaf>] ?
security_file_permission+0xf/0x20
Oct  8 00:59:39 vh03 kernel:  [<c01964d0>] ? rw_verify_area+0x60/0xc0
Oct  8 00:59:39 vh03 kernel:  [<c0196b56>] vfs_write+0x146/0x150
Oct  8 00:59:39 vh03 kernel:  [<c01970fd>] sys_write+0x3d/0x70
Oct  8 00:59:39 vh03 kernel:  [<c0104501>] syscall_call+0x7/0xb
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: sock_recvmsg returned -110
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: peer( Secondary -> Unknown )
conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: short read (expected size 16)
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: asender terminated
Oct  8 01:03:46 vh03 kernel: block drbd1: role( Secondary -> Primary )
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: Terminating asender thread
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: Connection closed
Oct  8 01:03:46 vh03 kernel: block drbd1: new current UUID
16D41637EA6A4C4B:5A99C51D68CDB446:188F44BA42FFFCF4:188E44BA42FFFCF4
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: conn( BrokenPipe ->
Unconnected )
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: receiver terminated
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: Restarting receiver thread
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: receiver (re)started
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: conn( Unconnected ->
WFConnection )
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: initial packet S crossed
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: Handshake successful: Agreed
network protocol version 101
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: conn( WFConnection ->
WFReportParams )
Oct  8 01:03:46 vh03 kernel: d-con drbd-sr1: Starting asender thread (from
drbd_r_drbd-sr1 [6827])
Oct  8 01:03:47 vh03 kernel: block drbd1: drbd_sync_handshake:
Oct  8 01:03:47 vh03 kernel: block drbd1: self
16D41637EA6A4C4B:5A99C51D68CDB446:188F44BA42FFFCF4:188E44BA42FFFCF4 bits:0
flags:0
Oct  8 01:03:47 vh03 kernel: block drbd1: peer
5A99C51D68CDB446:0000000000000000:188F44BA42FFFCF4:188E44BA42FFFCF4 bits:0
flags:0
Oct  8 01:03:47 vh03 kernel: block drbd1: uuid_compare()=1 by rule 70
Oct  8 01:03:47 vh03 kernel: block drbd1: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
Oct  8 01:03:47 vh03 kernel: block drbd1: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
Oct  8 01:03:47 vh03 kernel: block drbd1: receive bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
Oct  8 01:03:47 vh03 kernel: block drbd1: helper command: /sbin/drbdadm
before-resync-source minor-1
Oct  8 01:03:47 vh03 kernel: block drbd1: helper command: /sbin/drbdadm
before-resync-source minor-1 exit code 0 (0x0)
Oct  8 01:03:47 vh03 kernel: block drbd1: conn( WFBitMapS -> SyncSource )
pdsk( Consistent -> Inconsistent )
Oct  8 01:03:47 vh03 kernel: block drbd1: Began resync as SyncSource (will
sync 0 KB [0 bits set]).
Oct  8 01:03:47 vh03 kernel: block drbd1: updated sync UUID
16D41637EA6A4C4B:5A9AC51D68CDB446:5A99C51D68CDB446:188F44BA42FFFCF4
Oct  8 01:03:47 vh03 kernel: block drbd1: Resync done (total 1 sec; paused
0 sec; 0 K/sec)
Oct  8 01:03:47 vh03 kernel: block drbd1: updated UUIDs
16D41637EA6A4C4B:0000000000000000:5A9AC51D68CDB446:5A99C51D68CDB446
Oct  8 01:03:47 vh03 kernel: block drbd1: conn( SyncSource -> Connected )
pdsk( Inconsistent -> UpToDate )

*Looks like there was some whacky stuff going on with ovs-switchd (Open
vSwitch; part of XenServer) and drbd that locked the machine up.. then at
01:03:46 it decides to start working again, and vh03 is successfully made
the primary:*

[root at vh03 ~]# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by
root at xenddk.eightmile.locatrix.net, 2012-09-20 14:40:39
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:102855 nr:530600089 dw:530702944 dr:288517 al:196 bm:25600 lo:0 pe:0
ua:0 ap:0 ep:1 wo:f oos:0

Very strange! Any ideas, guys?

Thanks very much for any help.

Cheers,
Andrew



On Sun, Oct 7, 2012 at 10:50 AM, Felix Frank <ff at mpexnet.de> wrote:

> On 10/07/2012 03:22 PM, Dan Barker wrote:
>
>> My system logs them with timestamps. They just happen to be relative to
>> boot
>> time. I dmtime say "dmtime 12345" and see the real time stamp.
>>
>
> Yes, most kernels do that.
>
> In this particular use case, those relative times would even suffice,
> because one could tell how long it took from promotion to finish.
>
>
> Cheers,
> Felix
> ______________________________**_________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/**mailman/listinfo/drbd-user<http://lists.linbit.com/mailman/listinfo/drbd-user>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20121007/406c498c/attachment.htm>


More information about the drbd-user mailing list