Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
we've got a drbd-8.4.1 installation running in a custom HA setup. The controllers are connected via a 100M Ethernet link with an Ethernet switch. In order to keep our services responsive, we need to keep the resources' usage low during initial/full synchronization. We've tried a fixed resync-rate as well as the dynamic sync rate controller to set the maximum resync speed at 1M. We also gave drbd-8.4.2rc3 a try and still failed to achieve the goal. Here is our DRBD configuration:
# /etc/drbd.d/global_common.conf:
global {
usage-count no;
}
common {
net {
protocol C;
verify-alg md5;
after-sb-0pri discard-least-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
ping-timeout 15;
}
disk {
fencing resource-only;
al-extents 7;
disk-flushes no;
disk-barrier no;
md-flushes no;
}
handlers {
pri-on-incon-degr "/usr/local/sbin/drbd-handler.sh pri-on-incon-degr";
pri-lost-after-sb "/usr/local/sbin/drbd-handler.sh pri-lost-after-sb";
local-io-error "/usr/local/sbin/drbd-handler.sh handler local-io-error";
fence-peer "/usr/local/sbin/drbd-handler.sh fence-peer";
pri-lost "/usr/local/sbin/drbd-handler.sh pri-lost";
initial-split-brain "/usr/local/sbin/drbd-handler.sh initial-split-brain";
split-brain "/usr/local/sbin/drbd-handler.sh split-brain";
out-of-sync "/usr/local/sbin/drbd-handler.sh out-of-sync";
before-resync-target "/usr/local/sbin/drbd-handler.sh before-resync-target";
after-resync-target "/usr/local/sbin/drbd-handler.sh after-resync-target";
before-resync-source "/usr/local/sbin/drbd-handler.sh before-resync-source";
}
}
# /etc/drbd.d/r5.res:
resource r5 {
device /dev/drbd5 minor 5;
disk /dev/ram3;
flexible-meta-disk /dev/ram4;
floating 169.254.0.9:7794;
floating 169.254.0.10:7794;
disk {
# force a fixed resync-rate:
# c-plan-ahead 0;
# resync-rate 1M;
# or use the dynamic sync rate controller:
c-plan-ahead 24;
c-fill-target 128;
c-min-rate 1;
c-max-rate 1M;
resync-after r4;
size 39M;
}
}
In this case r5 is a ramdisk, which is fast enough to show the problem. Here is some runtime status while using a fixed resync-rate:
# drbdsetup show 5
resource r5 {
options {
}
net {
after-sb-0pri discard-least-changes;
after-sb-1pri discard-secondary;
ping-timeout 15; # 1/10 seconds
verify-alg "md5";
}
_remote_host {
address ipv4 169.254.0.10:7794;
}
_this_host {
address ipv4 169.254.0.9:7794;
volume 0 {
device minor 5;
disk "/dev/ram3";
meta-disk "/dev/ram4";
disk {
size 79872s; # bytes
fencing resource-only;
resync-rate 1024k; # bytes/second
resync-after 4;
al-extents 7;
c-plan-ahead 0; # 1/10 seconds
}
}
}
}
# cat /proc/drbd
version: 8.4.2rc3 (api:1/proto:86-101)
GIT-hash: f25fbeaaf922cb91e33a9c2eb99560473b0b2122 build by build at lnxdev, 2012-09-05 15:40:38
(...)
5: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
ns:0 nr:38912 dw:38912 dr:0 al:0 bm:5 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:1024
[===================>] sync'ed:100.0% (1024/39936)K
finish: 0:00:00 speed: 5,556 (5,556) want: 1,024 K/sec
And here, using the dynamic sync rate controller:
# drbdsetup show 5
resource r5 {
options {
}
net {
after-sb-0pri discard-least-changes;
after-sb-1pri discard-secondary;
ping-timeout 15; # 1/10 seconds
verify-alg "md5";
}
_remote_host {
address ipv4 169.254.0.10:7794;
}
_this_host {
address ipv4 169.254.0.9:7794;
volume 0 {
device minor 5;
disk "/dev/ram3";
meta-disk "/dev/ram4";
disk {
size 79872s; # bytes
fencing resource-only;
resync-after 4;
al-extents 7;
c-plan-ahead 24; # 1/10 seconds
c-fill-target 128s; # bytes
c-max-rate 1024k; # bytes/second
c-min-rate 1k; # bytes/second
}
}
}
}
# cat /proc/drbd
version: 8.4.2rc3 (api:1/proto:86-101)
GIT-hash: f25fbeaaf922cb91e33a9c2eb99560473b0b2122 build by build at lnxdev, 2012-09-05 15:40:38
(...)
5: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
ns:0 nr:25088 dw:25088 dr:0 al:0 bm:4 lo:1 pe:1 ua:0 ap:0 ep:1 wo:d oos:14848
[=============>......] sync'ed: 70.0% (14848/39936)K
finish: 0:00:02 speed: 5,016 (5,016) want: 1,000 K/sec
The messages indicated 5+M average resync speed in both cases at the end, which is almost everything we can get:
(...)
2012-09-06T08:46:26.181424+00:00 hatest kernel: d-con r5: Handshake successful: Agreed network protocol version 101
2012-09-06T08:46:26.242833+00:00 hatest kernel: d-con r5: conn( WFConnection -> WFReportParams )
2012-09-06T08:46:26.243035+00:00 hatest kernel: d-con r5: Starting asender thread (from drbd_r_r5 [4667])
2012-09-06T08:46:26.468356+00:00 hatest kernel: block drbd5: drbd_sync_handshake:
2012-09-06T08:46:26.468935+00:00 hatest kernel: block drbd5: self 0000000000000000:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:0
2012-09-06T08:46:26.469077+00:00 hatest kernel: block drbd5: peer 8650ED8A7A065A7B:72980EB66AD1359D:3DD9B18961267343:0000000000000004 bits:0 flags:0
2012-09-06T08:46:26.469184+00:00 hatest kernel: block drbd5: uuid_compare()=-2 by rule 20
2012-09-06T08:46:26.469274+00:00 hatest kernel: block drbd5: Becoming sync target due to disk states.
2012-09-06T08:46:26.469386+00:00 hatest kernel: block drbd5: Writing the whole bitmap, full sync required after drbd_sync_handshake.
2012-09-06T08:46:26.629246+00:00 hatest kernel: block drbd5: bitmap WRITE of 1 pages took 0 jiffies
2012-09-06T08:46:26.632793+00:00 hatest kernel: block drbd5: meta data flush failed with status -5, disabling md-flushes
2012-09-06T08:46:26.632930+00:00 hatest kernel: block drbd5: 39 MB (9984 bits) marked out-of-sync by on disk bit-map.
2012-09-06T08:46:26.633042+00:00 hatest kernel: block drbd5: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
2012-09-06T08:46:26.773857+00:00 hatest kernel: block drbd5: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 98.4%
2012-09-06T08:46:26.774333+00:00 hatest kernel: block drbd5: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 98.4%
2012-09-06T08:46:26.774468+00:00 hatest kernel: block drbd5: conn( WFBitMapT -> WFSyncUUID )
2012-09-06T08:46:27.272671+00:00 hatest kernel: block drbd5: updated sync uuid 72990EB66AD1359C:0000000000000000:0000000000000000:0000000000000000
2012-09-06T08:46:27.278278+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm before-resync-target minor-5
2012-09-06T08:46:27.278464+00:00 hatest kernel: block drbd6: peer_isp( 0 -> 1 )
2012-09-06T08:46:27.278580+00:00 hatest kernel: block drbd7: peer_isp( 0 -> 1 )
2012-09-06T08:46:27.278683+00:00 hatest kernel: block drbd7: aftr_isp( 0 -> 1 )
2012-09-06T08:46:27.278772+00:00 hatest kernel: block drbd8: aftr_isp( 0 -> 1 )
2012-09-06T08:46:27.506348+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm before-resync-target minor-5 exit code 0 (0x0)
2012-09-06T08:46:27.507006+00:00 hatest kernel: block drbd5: conn( WFSyncUUID -> SyncTarget )
2012-09-06T08:46:27.507140+00:00 hatest kernel: block drbd6: aftr_isp( 0 -> 1 )
2012-09-06T08:46:27.507243+00:00 hatest kernel: block drbd5: Began resync as SyncTarget (will sync 39936 KB [9984 bits set]).
2012-09-06T08:46:35.522342+00:00 hatest kernel: block drbd5: Resync done (total 7 sec; paused 0 sec; 5704 K/sec)
2012-09-06T08:46:35.528636+00:00 hatest kernel: block drbd5: updated UUIDs 8650ED8A7A065A7A:0000000000000000:72990EB66AD1359C:72980EB66AD1359D
2012-09-06T08:46:35.528822+00:00 hatest kernel: block drbd5: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
2012-09-06T08:46:35.528916+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm after-resync-target minor-5
2012-09-06T08:46:35.682465+00:00 hatest kernel: block drbd6: peer_isp( 1 -> 0 )
2012-09-06T08:46:35.721770+00:00 hatest kernel: block drbd6: aftr_isp( 1 -> 0 )
2012-09-06T08:46:35.721890+00:00 hatest kernel: block drbd7: aftr_isp( 1 -> 0 )
2012-09-06T08:46:35.811816+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm after-resync-target minor-5 exit code 0 (0x0)
(...)
How to set up the sync rate controller or a fixed resync-rate?
Regards,
Lucas