[DRBD-user] Please help with slow Replication / Syncing on debian squeeze 6.0, 8-disk RAID6

Uwe Schuerkamp uwe.schuerkamp at nionex.net
Mon Mar 12 11:56:22 CET 2012

Hi folks,

I'm a DRBD newbie when it comes to administration, so please bear with
me a for a second while describing my problem: 

- Setup: 2x COMPAQ ProLianz DL 385g7, Raid 6 over 8 900GB SAS disks on
internal p410i controller, initial RAID rebuild completed, 128 GB RAM,
2 x HexaCore CPU's

- OS: Debian Squeeze 6.0 64bit, 128GB RAM, DRBD 8.3.7 (from Repos) 

- Dedicated gigabit link on eth1 for DRBD sync / replication, speed
around 100MB/sec using rsync or scp. The inferfaces are directly
connnected, there's no other network hardware involved, collision /
error stats on both ends seem fine and show no apparent problems.

show all output on drbd0: 

# drbdsetup /dev/drbd0 show
disk {
        size                    0s _is_default; # bytes
        on-io-error             pass_on _is_default;
        fencing                 dont-care _is_default;
        max-bio-bvecs           0 _is_default;
net {
        timeout                 60 _is_default; # 1/10 seconds
        max-epoch-size          2048 _is_default;
        max-buffers             2048 _is_default;
        unplug-watermark        128 _is_default;
        connect-int             10 _is_default; # seconds
        ping-int                10 _is_default; # seconds
        sndbuf-size             0 _is_default; # bytes
        rcvbuf-size             0 _is_default; # bytes
        ko-count                0 _is_default;
        cram-hmac-alg           "sha1";
        shared-secret           "XXXXXXXXXXXXXXXXXXXXXXXXXXX"; (edited) 
        after-sb-0pri           disconnect _is_default;
        after-sb-1pri           disconnect _is_default;
        after-sb-2pri           disconnect _is_default;
        rr-conflict             disconnect _is_default;
        ping-timeout            5 _is_default; # 1/10 seconds
syncer {
        rate                    102400k; # bytes/second
        after                   -1 _is_default;
        al-extents              127 _is_default;
protocol C;
_this_host {
        device                  minor 0;
        disk                    "/dev/vg00/lvol0";
        meta-disk               internal;
        address                 ipv4;
_remote_host {
        address                 ipv4;

Some benchmarks on the mounted xfs filesystem when syncing is active: 

# time  dd oflag=direct if=/dev/zero of=/mnt/disk/speedtest bs=10M count=10 
10+0 records in
10+0 records out
104857600 bytes (105 MB) copied, 33.2383 s, 3.2 MB/s

Same fs, Syncing inactive: 

# time  dd oflag=direct if=/dev/zero of=/mnt/disk/speedtest bs=1024M count=8
8+0 records in
8+0 records out
8589934592 bytes (8.6 GB) copied, 27.3747 s, 314 MB/s

My Problem: I'm only getting 10MB/sec sustained throughput during
initial re-sync of the DRBD device (~ 5TB), no matter wether I mount
the xfs device or leave it unmounted. I've tried various resync rates
without success, RAID & interface speed seem fine on non-drbd

/proc/drbd shows: (sorry for the line length)
# cat /proc/drbd 
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757 
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
    ns:48504579 nr:0 dw:9541797 dr:47365371 al:2346 bm:4890 lo:1
 pe:2770 ua:1741 ap:0 ep:1 wo:b oos:3252837352

        [>....................] sync'ed:  0.9% (3176596/3205188)M
        finish: 57:25:48 speed: 15,604 (11,448) K/sec

I've asked around on the #drbd irc channel, and the friendly folks
over there pointed me to this mailing list to describe my problems
here in the hope someone more knowledgable can step in and help.

Please let me know if you need any other info and I'll provide what I
can. The system isn't in production yet, so load on both servers is

All the best & thanks in advance for your comments, 


