[DRBD-user] Xenserver 6.1 - network problem when i promote node to primary

gerrykernan gerry.kernan at infinityit.ie
Mon Apr 22 13:21:36 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


hi

i am using drbd 8.4.3-2 on citrix xenserver 6.1 , but when i do drbdadm
primary all , the command hangs and eventally gives back "command timed out
aftefr 120 sec"
in dmesg is get the errors below. then both NIC get disconnected and after a
few minutes come back on line. when system eventually comes back the node is
in primary/seconday and connected but it takes a long time.
resource 1 is 220 Gb and resource 2 is 900 Gb


drbd version
drbd-utils-8.4.3-2
drbd-xen-8.4.3-2
drbd-km-2.6.32.43_0.4.1.xs1.6.10.734.170748xen-8.4.3-2

[  600.973858] INFO: task ovs-vswitchd:5615 blocked for more than 120
seconds.
[  600.973868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[  600.973874] ovs-vswitchd  D eeaabc9c     0  5615   5614 0x00000004
[  600.973879]  eeaabcb0 00000282 00000002 eeaabc9c c16db324 00000000
00000000 00000000
[  600.973886]  00000000 00000000 0000000f e871f4a4 e871f394 e871f300
e871f4a4 c16dd200
[  600.973892]  00000002 b948bfd4 0000005a ecff8580 00000019 00002306
00000000 c01d4790
[  600.973898] Call Trace:
[  600.973908]  [<c01d4790>] ? pollwake+0x0/0x70
[  600.973911]  [<c01d4790>] ? pollwake+0x0/0x70
[  600.973917]  [<c03d418c>] __mutex_lock_slowpath+0x10c/0x160
[  600.973921]  [<c03d3fe5>] mutex_lock+0x25/0x40
[  600.973926]  [<c036d875>] genl_rcv+0x15/0x30
[  600.973929]  [<c036ba81>] netlink_unicast+0x241/0x250
[  600.973934]  [<c0349acc>] ? memcpy_fromiovec+0x4c/0x70
[  600.973938]  [<c036c771>] netlink_sendmsg+0x1c1/0x280
[  600.973941]  [<c033ffd7>] sock_sendmsg+0xd7/0x100
[  600.973945]  [<c014e6b0>] ? autoremove_wake_function+0x0/0x50
[  600.973949]  [<c014e6b0>] ? autoremove_wake_function+0x0/0x50
[  600.973952]  [<c0151cd1>] ? __hrtimer_start_range_ns+0xe1/0x190
[  600.973957]  [<c0261871>] ? copy_from_user+0x41/0x70
[  600.973960]  [<c0349df6>] ? verify_iovec+0x36/0xa0
[  600.973963]  [<c0340116>] sys_sendmsg+0x116/0x230
[  600.973967]  [<c0340c07>] ? sys_recvmsg+0xf7/0x1c0
[  600.973971]  [<c0146b23>] ? get_signal_to_deliver+0xa3/0x4e0
[  600.973976]  [<c01c43d9>] ? do_sync_read+0xd9/0x110
[  600.973979]  [<c033f0d4>] ? sock_poll+0x14/0x20
[  600.973983]  [<c01f540a>] ? ep_send_events_proc+0x5a/0x100
[  600.973987]  [<c01f58ac>] ? ep_scan_ready_list+0xfc/0x150
[  600.973991]  [<c03413a7>] sys_socketcall+0x247/0x270
[  600.973995]  [<c012c420>] ? default_wake_function+0x0/0x20
[  600.973999]  [<c0104571>] syscall_call+0x7/0xb
[  600.974018] INFO: task drbdsetup:8432 blocked for more than 120 seconds.
[  600.974023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[  600.974029] drbdsetup     D 00000001     0  8432      1 0x00000004
[  600.974033]  ea92dc54 00000286 ea92dbd8 00000001 00000003 eeaf4d08
eeaf4d04 00000000
[  600.974039]  00000000 b94886b7 0000005a ee36d754 ee36d644 ee36d5b0
ee36d754 c16bd200
[  600.974045]  00000000 b9481226 0000005a edf373c0 00000000 00000008
edb9c198 eeaf4c00
[  600.974050] Call Trace:
[  600.974065]  [<f0f2524d>] ? _req_st_cond+0xed/0x130 [drbd]
[  600.974075]  [<f0f2834b>] drbd_req_state+0x14b/0x310 [drbd]
[  600.974079]  [<c01444a9>] ? complete_signal+0xd9/0x1b0
[  600.974082]  [<c014e6b0>] ? autoremove_wake_function+0x0/0x50
[  600.974092]  [<f0f28533>] _drbd_request_state+0x23/0xb0 [drbd]
[  600.974096]  [<c0145435>] ? force_sig_info+0xa5/0xc0
[  600.974107]  [<f0f1e968>] drbd_set_role+0x58/0x780 [drbd]
[  600.974118]  [<f0f1f4c6>] drbd_adm_set_role+0xa6/0xc0 [drbd]
[  600.974122]  [<c036ebc3>] genl_rcv_msg+0x183/0x1c0
[  600.974126]  [<c036ea40>] ? genl_rcv_msg+0x0/0x1c0
[  600.974129]  [<c036bced>] netlink_rcv_skb+0x7d/0xa0
[  600.974132]  [<c036d881>] genl_rcv+0x21/0x30
[  600.974136]  [<c036ba81>] netlink_unicast+0x241/0x250
[  600.974139]  [<c0349acc>] ? memcpy_fromiovec+0x4c/0x70
[  600.974143]  [<c036c771>] netlink_sendmsg+0x1c1/0x280
[  600.974146]  [<c033f63b>] sock_aio_write+0xeb/0x100
[  600.974150]  [<c01c42c9>] do_sync_write+0xd9/0x110
[  600.974154]  [<c014e6b0>] ? autoremove_wake_function+0x0/0x50
[  600.974158]  [<c01c4b68>] vfs_write+0x178/0x180
[  600.974161]  [<c01c5192>] sys_write+0x42/0x70
[  600.974165]  [<c0104571>] syscall_call+0x7/0xb


config
/etc/drbd.d/global_common.conf
global { usage-count yes; }
common {
protocol C;
net {
shared-secret "#####";
after-sb-0pri discard-zero-changes;
after-sb-1pri consensus;
after-sb-2pri disconnect;
}
disk { max-bio-bvecs 1; }
handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh"; }
syncer { rate 40M; }
}


/etc/drbd.d/drbd-sr1.res
resource drbd-sr1 {
device /dev/drbd1;
disk /dev/cciss/c0d1p1;
meta-disk internal;
on xenoctagon-1 { address 10.100.100.1:7788; }
on xen-octagon2 { address 10.100.100.2:7788; }
}

/etc/drbd.d/drbd-sr2.res
resource drbd-sr2 {
device /dev/drbd2;
disk /dev/cciss/c0d2p1;
meta-disk internal;
on xenoctagon-1 { address 10.100.100.1:7789; }
on xen-octagon2 { address 10.100.100.2:7789; }
}

Regards

Gerry Kernan



--
View this message in context: http://drbd.10923.n7.nabble.com/Xenserver-6-1-network-problem-when-i-promote-node-to-primary-tp17727.html
Sent from the DRBD - User mailing list archive at Nabble.com.



More information about the drbd-user mailing list