[DRBD-user] DRBD hangs Xen VMs and won't disconnect without pulling plug

nathan at robotics.net nathan at robotics.net
Thu Aug 7 17:01:12 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Your networking issues could actually have nothing to do with DRBD. I had 
very similar issues and they were caused by xen memory balloon. If your 
using this on a server you can fix this by setting a fixed amount of 
memory to domU and disabling memory ballooning.

Edit /etc/xen/xend-config.sxp and change (dom0-min-mem 256) to 
(dom0-min-mem 0). Then to set the amount of RAM for Dom0, edit 
/boot/grub/menu.lst and add  dom0_mem={amount of ram for Dom0) to the 
kernel /boot/xen.... line. I have 32 G of ram so I give Dom0 2G, my line 
looks like:

        kernel /boot/xen.gz-3.2 dom0_mem=2048M

All of my network issues on all 4 of my xen boxes were solved by the 
above.

><>
Nathan Stratton                                CTO, BlinkMind, Inc.
nathan at robotics.net                         nathan at blinkmind.com
http://www.robotics.net                        http://www.blinkmind.com

On Thu, 7 Aug 2008, simon at onepointltd.com wrote:

> Would appreciate some help debugging this problem, and hopefully solving it.
>
> I am running Paravirutalized 64-bit CentOS 5.x VMs on 64-bit CentOS 5.x Dom0 on DRBD partitions shared between two Dell 2590s. The DRBD connections are shared between two dedicated GB network ports using crossover cables. The DRBD partitions are logical volumes used as virtual disks for the actual VMs and as mounted pre-formated ext3 partitions for their data partitions.
>
> Occationally, the VMs will lock up, usually (I think) unable to access their data partition. In this fault condition "drbdadm disconnect <resourcename>" times out on both nodes. I can only resolve the situation by breaking the network connection with an "ifdown ethn" command. The VM is then able to carry on working and I can reconnect DRBD and carry on.
>
> Under fault condition I have had a VM where I could still log in via SSH but not able to access the data partition and another case this morning where SSH was not working. So I am not 100% sure yet if it is solely the data partitions of the VMs that is the problem.
>
> I can't see anything strange in /var/log/messages other than the expected time-outs that occur when I disconnect the network.
>
> Running kernel on both Dom0 machines is 2.6.18-92.1.6.el5xen.
> DRBD rpms are
> kmod-drbd82-xen-8.2.6-1.2.6.18_92.1.6.el5
> drbd82-8.2.6-1.el5.centos
>
> Here is a sample VM and it's drbd.conf entries. Although I am allowing dual primary, this mode is not normally used. This is for live migrating VMs as a (currently) manual operation from one machine to the other.
>
>
> ________drbd.conf ___________
> global { usage-count no; }
> common { syncer { rate 600M; } }
> resource webcast {
> protocol C;
>
> startup {
> wfc-timeout 300;
> degr-wfc-timeout 280;
> }
>
> disk {
> on-io-error pass_on;
> }
>
> net {
> allow-two-primaries;
> cram-hmac-alg sha1;
> shared-secret "webcast_server";
> }
> on xen1.bkwsu.eu {
> device /dev/drbd5;
> disk /dev/MultiCopy/webcast;
> address 10.0.0.1:7789;
> meta-disk internal;
> }
> on xen2.bkwsu.eu {
> device /dev/drbd5;
> disk /dev/SingleCopy/webcast;
> address 10.0.0.2:7789;
> meta-disk internal;
> }
> }
> resource WebcastArchiveDRBD {
> protocol C;
>
> startup {
> wfc-timeout 300;
> degr-wfc-timeout 280;
> }
>
> disk {
> on-io-error pass_on;
> }
>
> net {
> allow-two-primaries;
> cram-hmac-alg sha1;
> shared-secret "webcast_server";
> }
> on xen1.bkwsu.eu {
> device /dev/drbd21;
> disk /dev/MultiCopy/WebcastArchiveDRBD;
> address 10.0.0.1:7790;
> meta-disk internal;
> }
> on xen2.bkwsu.eu {
> device /dev/drbd21;
> disk /dev/SingleCopy/WebcastArchiveDRBD;
> address 10.0.0.2:7790;
> meta-disk internal;
> }
> }
>
> ________webcast xen definition__________
> name = "webcast"
> uuid = "e550453f-d1a4-3430-b7d5-6c7a4bf40b5e"
> maxmem = 256
> memory = 256
> vcpus = 2
> bootloader = "/usr/bin/pygrub"
> on_poweroff = "destroy"
> on_reboot = "restart"
> on_crash = "restart"
> vfb = [ "type=vnc,vncunused=1" ]
> disk = [ "phy:/dev/drbd5,xvda,w", "phy:/dev/drbd21,sdb1,w" ]
> vif = [ "mac=00:16:3e:2e:6e:1b,bridge=xenbr0", "mac=00:16:3e:10:f7:d3,bridge=xenbr0" ]
>
>
>



More information about the drbd-user mailing list