[DRBD-user] Limiting the DRBD resync speed

Thu Sep 6 13:26:30 CEST 2012

Hello,
we've got a drbd-8.4.1 installation running in a custom HA setup. The controllers are connected via a 100M Ethernet link with an Ethernet switch. In order to keep our services responsive, we need to keep the resources' usage low during initial/full synchronization. We've tried a fixed resync-rate as well as the dynamic sync rate controller to set the maximum resync speed at 1M. We also gave drbd-8.4.2rc3 a try and still failed to achieve the goal. Here is our DRBD configuration:

# /etc/drbd.d/global_common.conf:
global {
  usage-count no;
}
common {
  net {
     protocol C;
     verify-alg md5;
     after-sb-0pri discard-least-changes;
     after-sb-1pri discard-secondary;
     after-sb-2pri disconnect;
     ping-timeout 15;
  }
  disk {
    fencing resource-only;
    al-extents 7;
    disk-flushes no;
    disk-barrier no;
    md-flushes no;
  }
  handlers {
    pri-on-incon-degr    "/usr/local/sbin/drbd-handler.sh pri-on-incon-degr";
    pri-lost-after-sb    "/usr/local/sbin/drbd-handler.sh pri-lost-after-sb";
    local-io-error       "/usr/local/sbin/drbd-handler.sh handler local-io-error";
    fence-peer           "/usr/local/sbin/drbd-handler.sh fence-peer";
    pri-lost             "/usr/local/sbin/drbd-handler.sh pri-lost";
    initial-split-brain  "/usr/local/sbin/drbd-handler.sh initial-split-brain";
    split-brain          "/usr/local/sbin/drbd-handler.sh split-brain";
    out-of-sync          "/usr/local/sbin/drbd-handler.sh out-of-sync";
    before-resync-target "/usr/local/sbin/drbd-handler.sh before-resync-target";
    after-resync-target  "/usr/local/sbin/drbd-handler.sh after-resync-target";
    before-resync-source "/usr/local/sbin/drbd-handler.sh before-resync-source";
  }
}

# /etc/drbd.d/r5.res:
resource r5 {
  device    /dev/drbd5 minor 5;
  disk      /dev/ram3;
  flexible-meta-disk /dev/ram4;
  floating 169.254.0.9:7794;
  floating 169.254.0.10:7794;
    disk {
# force a fixed resync-rate:
#      c-plan-ahead 0;
#      resync-rate 1M;
# or use the dynamic sync rate controller:
      c-plan-ahead 24;
      c-fill-target 128;
      c-min-rate 1;
      c-max-rate 1M;

      resync-after r4;
      size 39M;
    }
}

In this case r5 is a ramdisk, which is fast enough to show the problem. Here is some runtime status while using a fixed resync-rate:
# drbdsetup show 5         
resource r5 {
    options {
    }
    net {
        after-sb-0pri           discard-least-changes;
        after-sb-1pri           discard-secondary;
        ping-timeout            15; # 1/10 seconds
        verify-alg              "md5";
    }
    _remote_host {
        address                 ipv4 169.254.0.10:7794;
    }
    _this_host {
        address                 ipv4 169.254.0.9:7794;
        volume 0 {
            device                      minor 5;
            disk                        "/dev/ram3";
            meta-disk                   "/dev/ram4";
            disk {
                size                    79872s; # bytes
                fencing                 resource-only;
                resync-rate             1024k; # bytes/second
                resync-after            4;
                al-extents              7;
                c-plan-ahead            0; # 1/10 seconds
            }
        }
    }
}

# cat /proc/drbd
version: 8.4.2rc3 (api:1/proto:86-101)
GIT-hash: f25fbeaaf922cb91e33a9c2eb99560473b0b2122 build by build at lnxdev, 2012-09-05 15:40:38
(...)
 5: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:38912 dw:38912 dr:0 al:0 bm:5 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:1024
    [===================>] sync'ed:100.0% (1024/39936)K
    finish: 0:00:00 speed: 5,556 (5,556) want: 1,024 K/sec

And here, using the dynamic sync rate controller:
# drbdsetup show 5
resource r5 {
    options {
    }
    net {
        after-sb-0pri           discard-least-changes;
        after-sb-1pri           discard-secondary;
        ping-timeout            15; # 1/10 seconds
        verify-alg              "md5";
    }
    _remote_host {
        address                 ipv4 169.254.0.10:7794;
    }
    _this_host {
        address                 ipv4 169.254.0.9:7794;
        volume 0 {
            device                      minor 5;
            disk                        "/dev/ram3";
            meta-disk                   "/dev/ram4";
            disk {
                size                    79872s; # bytes
                fencing                 resource-only;
                resync-after            4;
                al-extents              7;
                c-plan-ahead            24; # 1/10 seconds
                c-fill-target           128s; # bytes
                c-max-rate              1024k; # bytes/second
                c-min-rate              1k; # bytes/second
            }
        }
    }
}

# cat /proc/drbd
version: 8.4.2rc3 (api:1/proto:86-101)
GIT-hash: f25fbeaaf922cb91e33a9c2eb99560473b0b2122 build by build at lnxdev, 2012-09-05 15:40:38
(...)
 5: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:25088 dw:25088 dr:0 al:0 bm:4 lo:1 pe:1 ua:0 ap:0 ep:1 wo:d oos:14848
    [=============>......] sync'ed: 70.0% (14848/39936)K
    finish: 0:00:02 speed: 5,016 (5,016) want: 1,000 K/sec

The messages indicated 5+M average resync speed in both cases at the end, which is almost everything we can get:
(...)
2012-09-06T08:46:26.181424+00:00 hatest kernel: d-con r5: Handshake successful: Agreed network protocol version 101
2012-09-06T08:46:26.242833+00:00 hatest kernel: d-con r5: conn( WFConnection -> WFReportParams ) 
2012-09-06T08:46:26.243035+00:00 hatest kernel: d-con r5: Starting asender thread (from drbd_r_r5 [4667])
2012-09-06T08:46:26.468356+00:00 hatest kernel: block drbd5: drbd_sync_handshake:
2012-09-06T08:46:26.468935+00:00 hatest kernel: block drbd5: self 0000000000000000:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:0
2012-09-06T08:46:26.469077+00:00 hatest kernel: block drbd5: peer 8650ED8A7A065A7B:72980EB66AD1359D:3DD9B18961267343:0000000000000004 bits:0 flags:0
2012-09-06T08:46:26.469184+00:00 hatest kernel: block drbd5: uuid_compare()=-2 by rule 20
2012-09-06T08:46:26.469274+00:00 hatest kernel: block drbd5: Becoming sync target due to disk states.
2012-09-06T08:46:26.469386+00:00 hatest kernel: block drbd5: Writing the whole bitmap, full sync required after drbd_sync_handshake.
2012-09-06T08:46:26.629246+00:00 hatest kernel: block drbd5: bitmap WRITE of 1 pages took 0 jiffies
2012-09-06T08:46:26.632793+00:00 hatest kernel: block drbd5: meta data flush failed with status -5, disabling md-flushes
2012-09-06T08:46:26.632930+00:00 hatest kernel: block drbd5: 39 MB (9984 bits) marked out-of-sync by on disk bit-map.
2012-09-06T08:46:26.633042+00:00 hatest kernel: block drbd5: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) 
2012-09-06T08:46:26.773857+00:00 hatest kernel: block drbd5: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 98.4%
2012-09-06T08:46:26.774333+00:00 hatest kernel: block drbd5: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 98.4%
2012-09-06T08:46:26.774468+00:00 hatest kernel: block drbd5: conn( WFBitMapT -> WFSyncUUID ) 
2012-09-06T08:46:27.272671+00:00 hatest kernel: block drbd5: updated sync uuid 72990EB66AD1359C:0000000000000000:0000000000000000:0000000000000000
2012-09-06T08:46:27.278278+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm before-resync-target minor-5
2012-09-06T08:46:27.278464+00:00 hatest kernel: block drbd6: peer_isp( 0 -> 1 ) 
2012-09-06T08:46:27.278580+00:00 hatest kernel: block drbd7: peer_isp( 0 -> 1 ) 
2012-09-06T08:46:27.278683+00:00 hatest kernel: block drbd7: aftr_isp( 0 -> 1 ) 
2012-09-06T08:46:27.278772+00:00 hatest kernel: block drbd8: aftr_isp( 0 -> 1 ) 
2012-09-06T08:46:27.506348+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm before-resync-target minor-5 exit code 0 (0x0)
2012-09-06T08:46:27.507006+00:00 hatest kernel: block drbd5: conn( WFSyncUUID -> SyncTarget ) 
2012-09-06T08:46:27.507140+00:00 hatest kernel: block drbd6: aftr_isp( 0 -> 1 ) 
2012-09-06T08:46:27.507243+00:00 hatest kernel: block drbd5: Began resync as SyncTarget (will sync 39936 KB [9984 bits set]).
2012-09-06T08:46:35.522342+00:00 hatest kernel: block drbd5: Resync done (total 7 sec; paused 0 sec; 5704 K/sec)
2012-09-06T08:46:35.528636+00:00 hatest kernel: block drbd5: updated UUIDs 8650ED8A7A065A7A:0000000000000000:72990EB66AD1359C:72980EB66AD1359D
2012-09-06T08:46:35.528822+00:00 hatest kernel: block drbd5: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) 
2012-09-06T08:46:35.528916+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm after-resync-target minor-5
2012-09-06T08:46:35.682465+00:00 hatest kernel: block drbd6: peer_isp( 1 -> 0 ) 
2012-09-06T08:46:35.721770+00:00 hatest kernel: block drbd6: aftr_isp( 1 -> 0 ) 
2012-09-06T08:46:35.721890+00:00 hatest kernel: block drbd7: aftr_isp( 1 -> 0 ) 
2012-09-06T08:46:35.811816+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm after-resync-target minor-5 exit code 0 (0x0)
(...)

How to set up the sync rate controller or a fixed resync-rate?
Regards,
Lucas