Raymond Khalife raymond.khalife at hulu.com
Tue Jul 1 02:13:42 CEST 2014

I'm using 8.4.4 on CentOS 6.5
I've created a drbd partition on two nodes (see pg.res below).
The machines are actually VMs running on Xen hosts.
The drbd partition is 800Go large.
(There isn't a dedicated channel channel for drbd but I'll have that soon)

Suppose I make VMNode1 primary and mount the drbd drive (ext4) I can
consistently crash the Xen host running VMNode1 by copying a large file,
call it F (~200Go) to VMNode1.
What happens is that Dom0 runs out of memory and starts killing processes
eventually making the VMNode1 inaccessible.

If I take the secondary node offline, I can copy F to node1 without any
issues then bring up node2 and it will synch.

I've tried copying huge files between the VMNode1 and VMNode2 to standard
partitions (not drbd managed) and that worked as expected.

I'm wondering if anyone's ever run across a similar issue and if there are
any recommendations for or against running DRBD in VMs running on Xen.


### pg.res
resource pg {
        protocol        C;
        startup {
                wfc-timeout             15;
                degr-wfc-timeout        60;
        net {
                cram-hmac-alg sha1;
                shared-secret "mybigsecret";
                after-sb-0pri discard-least-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri call-pri-lost-after-sb;
        syncer {
                csums-alg sha1;
        disk {
                fencing resource-only;
        handlers {
                fence-peer              "/usr/lib/drbd/crm-fence-peer.sh";
                after-resync-target     "/usr/lib/drbd/crm-unfence-peer.sh";
        on hapostgresdev01 {
                device          /dev/drbd0;
                disk            /dev/xvdg;
                meta-disk       internal;

        on hapostgresdev02 {
                device          /dev/drbd0;
                disk            /dev/xvdg;
                meta-disk       internal;

