[DRBD-user] Congestion and freeze heavy write/copy

Mon Apr 7 12:53:37 CEST 2014

Dear drbd users,

I am currently facing a problem with drbd replication. I set up an HA nas
using heartbeat and drbd. The drbd disk is shared through NFS to a proxmox
server which is running VM and CT. Those ct and vm are stored on the nas.

My problem is that when I am copying big files ( more than 1 GB) my VM and
CT are freezing, for example apache on my vms is not answering to http
request which is a real issue for us because vms are hosting our web
applications.

It seems that we have some congestion issues somewhere however we cannot
use protocol A and on-congestion parameters because we need the two nodes
to be always synchronised. The two nas are using a Gigabit ethernet
connexion for drbd and nfs.

Here is my .res file:
resource btrfs {
        protocol C;
        startup {
                wfc-timeout 0;
                degr-wfc-timeout 120;
                become-primary-on nas1;
        }
        disk {
                on-io-error detach;
                al-extents 3389;
                disk-barrier no;
                disk-flushes no;
        }

        net {
                after-sb-0pri discard-older-primary;
                after-sb-1pri call-pri-lost-after-sb;
                after-sb-2pri call-pri-lost-after-sb;
                max-buffers 8000;
                max-epoch-size 8000;
                sndbuf-size 512k;
        }

        on nas1 {
                device    /dev/drbd0;
                disk      /dev/md3;
                meta-disk internal;
                address   ***.***.***.***:7788;
        }
        on nas2 {
                device    /dev/drbd0;
                disk      /dev/sda3;
                meta-disk internal;
                address   ***.***.***.***:7788;
        }
}

When I copy big files (using dd or rsync) cat /proc/drbd have flags a,b or
n which means that there is congestion.

I have tried with protocol B but it doesn't change anything. I have tried
to use c-min-rate and c-max-rate but I am not sure about which values I
should use and it didn't seem to have any effect.

I hope you could help me because it is a huge problem for us here. We need
in the same time our drbd replication to stay uptodate but also our vm and
ct to work I don't need them to be as fast as usual but just not frozen. I
cannot let all my VMs frozen each time there is a copy biger than 3Gb
happening on the NAS.

I have tried and read a lot about it on the internet but I couldn't find
any solution so I hope that someone here could help me to solve this
problem.

Thanks and regards,
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140407/8fe1789d/attachment.htm>