Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I moved my Proxmox cluster - consisting essentially of two physical servers, two Cisco NAS units where the (KVM) VM images live and two switches, to a new data centre where they now have new IP addresses. I reconfigured basic networking on the two servers, updated the IP addresses in the Proxmox config and rebooted the boxes, master node first. The storage is set up as /dev/drbdvg0 and /dev/drbdvg1. I didn't install this myself and I'm not that familiar with DRBD or indeed iSCSI. Both are used to store KVM guest virtual machine images, seen by both servers. Everything looked fine, until I attempted to start a VM on the second (slave) node. It took ages to start, hanging for thirty seconds at a time. It was clearly miscommunicating with the NAS. All of the images, including those set up on the second node, will run fine on the first (and that's what I'm doing for now). So the first (master) box has excellent access to the NAS, while the second (slave) has trouble reading from it. On the first box, /proc/drbd looks like this: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r---- ns:0 nr:0 dw:27568823 dr:156762105 al:309656 bm:309639 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:10184632 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r---- ns:0 nr:0 dw:2451648 dr:14918745 al:1244 bm:1211 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1152564 And on the second, troublesome box: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r---- ns:0 nr:0 dw:0 dr:1705944 al:0 bm:107 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:954596 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r---- ns:0 nr:0 dw:0 dr:1821288 al:0 bm:107 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:520192 So it looks like at some level they aren't talking to each other - I don't see the usual "UpToDate/UpToDate". I'm also seeing lots of messages like this on the second node: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4329026692, last ping 4329027942, now 4329029192 connection1:0: detected conn error (1011) Can anyone suggest what might have gone wrong here? A cabling issue maybe? Or how to fix it? I'm particular anxious to avoid losing updates to the images as seen by the first node if they manage to sync up - don't want to lose or corrupt the VM images! I inherited this setup and I'm not that familiar with DRBD, though keen to learn. Very grateful for any advice. Thanks, James -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120730/d322079a/attachment.htm>