[DRBD-user] Load high on primary node while doing backup on secondary

Irwin Nemetz inemetz at hotmail.com
Wed Apr 23 19:16:24 CEST 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I have a two node cluster. There are 3 mail nodes running as KVM virtualmachines on one node. The 3 VM's sit on top of a DRBD disk on a LVM volumewhich replicates to the passive 2nd node.
Hardware: 2x16 core AMD processors, 128gb memory, 5 3tb sas drives in a raid5
The drbd replication is over a crossover cable.
version: 8.4.4 (api:1/proto:86-101)GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil at Build64R6, 2013-10-14 15:33:06
 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----    ns:0 nr:1029579824 dw:1029579824 dr:0 al:0 bm:176936 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----    ns:0 nr:1117874156 dw:1117874156 dr:0 al:0 bm:176928 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----    ns:0 nr:1443855844 dw:1443855844 dr:0 al:0 bm:196602 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
resource zapp{  startup {    wfc-timeout 10;    outdated-wfc-timeout 10;    degr-wfc-timeout 10;  }  disk {    on-io-error detach;     rate 40M;    al-extents 3389;  }  net {   verify-alg sha1;   max-buffers 8000;   max-epoch-size 8000;   sndbuf-size 512k;   cram-hmac-alg sha1;   shared-secret sync_disk;   data-integrity-alg sha1;  }  on nodea.cluster.dns {   device /dev/drbd1;   disk /dev/virtimages/zapp;   address 10.88.88.171:7787;   meta-disk internal;  }  on nodeb.cluster.dns {   device /dev/drbd1;   disk /dev/virtimages/zapp;   address 10.88.88.172:7787;   meta-disk internal;  }}
I am trying to do a backup of the VM's nightly. They are about 2.7TB each.I create a snapshot on the backup node, mount it and then do a copy to aNAS backup storage device. The NAS is on it's own network.
Here's the script:
[root at nodeb ~]# cat backup-zapp.sh#!/bin/bash
datecat > /etc/drbd.d/snap.res <<EOFresource snap{  on nodea.cluster.dns {   device /dev/drbd99;   disk /dev/virtimages/snap-zapp;   address 10.88.88.171:7999;   meta-disk internal;  }  on nodeb.cluster.dns {   device /dev/drbd99;   disk /dev/virtimages/snap-zapp;   address 10.88.88.172:7999;   meta-disk internal;  }}EOF
/sbin/lvcreate -L500G -s -n snap-zapp /dev/virtimages/zapp
/sbin/drbdadm up snapsleep 2/sbin/drbdadm primary snapmount -t ext4 /dev/drbd99 /mnt/zappcd /rackstation/imagesmv -vf zapp.img zapp.img.-1mv -vf zapp-opt.img zapp-opt.img.-1cp -av /mnt/zapp/*.img /rackstation/imagesumount /mnt/zapp/sbin/drbdadm down snaprm -f /etc/drbd.d/snap.res/sbin/lvremove -f /dev/virtimages/snap-zappdate
About half way thru the copy, the copy starts stuttering (network trafficstops and starts) and the load on the primary machine and the virtualmachine being copied shoots thru the roof.
I am at lose to explain this since it's dealing with a snapshot of avolume on a replicated node. The only reasonable explanation I can thinkof is that the drbd replication is being blocked by something and this iscausing the disk on the primary node to become unresponsive.
Irwin 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140423/76f54ab0/attachment.htm>


More information about the drbd-user mailing list