Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, we've got a drbd-8.4.1 installation running in a custom HA setup. The controllers are connected via a 100M Ethernet link with an Ethernet switch. In order to keep our services responsive, we need to keep the resources' usage low during initial/full synchronization. We've tried a fixed resync-rate as well as the dynamic sync rate controller to set the maximum resync speed at 1M. We also gave drbd-8.4.2rc3 a try and still failed to achieve the goal. Here is our DRBD configuration: # /etc/drbd.d/global_common.conf: global { usage-count no; } common { net { protocol C; verify-alg md5; after-sb-0pri discard-least-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; ping-timeout 15; } disk { fencing resource-only; al-extents 7; disk-flushes no; disk-barrier no; md-flushes no; } handlers { pri-on-incon-degr "/usr/local/sbin/drbd-handler.sh pri-on-incon-degr"; pri-lost-after-sb "/usr/local/sbin/drbd-handler.sh pri-lost-after-sb"; local-io-error "/usr/local/sbin/drbd-handler.sh handler local-io-error"; fence-peer "/usr/local/sbin/drbd-handler.sh fence-peer"; pri-lost "/usr/local/sbin/drbd-handler.sh pri-lost"; initial-split-brain "/usr/local/sbin/drbd-handler.sh initial-split-brain"; split-brain "/usr/local/sbin/drbd-handler.sh split-brain"; out-of-sync "/usr/local/sbin/drbd-handler.sh out-of-sync"; before-resync-target "/usr/local/sbin/drbd-handler.sh before-resync-target"; after-resync-target "/usr/local/sbin/drbd-handler.sh after-resync-target"; before-resync-source "/usr/local/sbin/drbd-handler.sh before-resync-source"; } } # /etc/drbd.d/r5.res: resource r5 { device /dev/drbd5 minor 5; disk /dev/ram3; flexible-meta-disk /dev/ram4; floating 169.254.0.9:7794; floating 169.254.0.10:7794; disk { # force a fixed resync-rate: # c-plan-ahead 0; # resync-rate 1M; # or use the dynamic sync rate controller: c-plan-ahead 24; c-fill-target 128; c-min-rate 1; c-max-rate 1M; resync-after r4; size 39M; } } In this case r5 is a ramdisk, which is fast enough to show the problem. Here is some runtime status while using a fixed resync-rate: # drbdsetup show 5 resource r5 { options { } net { after-sb-0pri discard-least-changes; after-sb-1pri discard-secondary; ping-timeout 15; # 1/10 seconds verify-alg "md5"; } _remote_host { address ipv4 169.254.0.10:7794; } _this_host { address ipv4 169.254.0.9:7794; volume 0 { device minor 5; disk "/dev/ram3"; meta-disk "/dev/ram4"; disk { size 79872s; # bytes fencing resource-only; resync-rate 1024k; # bytes/second resync-after 4; al-extents 7; c-plan-ahead 0; # 1/10 seconds } } } } # cat /proc/drbd version: 8.4.2rc3 (api:1/proto:86-101) GIT-hash: f25fbeaaf922cb91e33a9c2eb99560473b0b2122 build by build at lnxdev, 2012-09-05 15:40:38 (...) 5: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r----- ns:0 nr:38912 dw:38912 dr:0 al:0 bm:5 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:1024 [===================>] sync'ed:100.0% (1024/39936)K finish: 0:00:00 speed: 5,556 (5,556) want: 1,024 K/sec And here, using the dynamic sync rate controller: # drbdsetup show 5 resource r5 { options { } net { after-sb-0pri discard-least-changes; after-sb-1pri discard-secondary; ping-timeout 15; # 1/10 seconds verify-alg "md5"; } _remote_host { address ipv4 169.254.0.10:7794; } _this_host { address ipv4 169.254.0.9:7794; volume 0 { device minor 5; disk "/dev/ram3"; meta-disk "/dev/ram4"; disk { size 79872s; # bytes fencing resource-only; resync-after 4; al-extents 7; c-plan-ahead 24; # 1/10 seconds c-fill-target 128s; # bytes c-max-rate 1024k; # bytes/second c-min-rate 1k; # bytes/second } } } } # cat /proc/drbd version: 8.4.2rc3 (api:1/proto:86-101) GIT-hash: f25fbeaaf922cb91e33a9c2eb99560473b0b2122 build by build at lnxdev, 2012-09-05 15:40:38 (...) 5: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r----- ns:0 nr:25088 dw:25088 dr:0 al:0 bm:4 lo:1 pe:1 ua:0 ap:0 ep:1 wo:d oos:14848 [=============>......] sync'ed: 70.0% (14848/39936)K finish: 0:00:02 speed: 5,016 (5,016) want: 1,000 K/sec The messages indicated 5+M average resync speed in both cases at the end, which is almost everything we can get: (...) 2012-09-06T08:46:26.181424+00:00 hatest kernel: d-con r5: Handshake successful: Agreed network protocol version 101 2012-09-06T08:46:26.242833+00:00 hatest kernel: d-con r5: conn( WFConnection -> WFReportParams ) 2012-09-06T08:46:26.243035+00:00 hatest kernel: d-con r5: Starting asender thread (from drbd_r_r5 [4667]) 2012-09-06T08:46:26.468356+00:00 hatest kernel: block drbd5: drbd_sync_handshake: 2012-09-06T08:46:26.468935+00:00 hatest kernel: block drbd5: self 0000000000000000:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:0 2012-09-06T08:46:26.469077+00:00 hatest kernel: block drbd5: peer 8650ED8A7A065A7B:72980EB66AD1359D:3DD9B18961267343:0000000000000004 bits:0 flags:0 2012-09-06T08:46:26.469184+00:00 hatest kernel: block drbd5: uuid_compare()=-2 by rule 20 2012-09-06T08:46:26.469274+00:00 hatest kernel: block drbd5: Becoming sync target due to disk states. 2012-09-06T08:46:26.469386+00:00 hatest kernel: block drbd5: Writing the whole bitmap, full sync required after drbd_sync_handshake. 2012-09-06T08:46:26.629246+00:00 hatest kernel: block drbd5: bitmap WRITE of 1 pages took 0 jiffies 2012-09-06T08:46:26.632793+00:00 hatest kernel: block drbd5: meta data flush failed with status -5, disabling md-flushes 2012-09-06T08:46:26.632930+00:00 hatest kernel: block drbd5: 39 MB (9984 bits) marked out-of-sync by on disk bit-map. 2012-09-06T08:46:26.633042+00:00 hatest kernel: block drbd5: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) 2012-09-06T08:46:26.773857+00:00 hatest kernel: block drbd5: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 98.4% 2012-09-06T08:46:26.774333+00:00 hatest kernel: block drbd5: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 98.4% 2012-09-06T08:46:26.774468+00:00 hatest kernel: block drbd5: conn( WFBitMapT -> WFSyncUUID ) 2012-09-06T08:46:27.272671+00:00 hatest kernel: block drbd5: updated sync uuid 72990EB66AD1359C:0000000000000000:0000000000000000:0000000000000000 2012-09-06T08:46:27.278278+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm before-resync-target minor-5 2012-09-06T08:46:27.278464+00:00 hatest kernel: block drbd6: peer_isp( 0 -> 1 ) 2012-09-06T08:46:27.278580+00:00 hatest kernel: block drbd7: peer_isp( 0 -> 1 ) 2012-09-06T08:46:27.278683+00:00 hatest kernel: block drbd7: aftr_isp( 0 -> 1 ) 2012-09-06T08:46:27.278772+00:00 hatest kernel: block drbd8: aftr_isp( 0 -> 1 ) 2012-09-06T08:46:27.506348+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm before-resync-target minor-5 exit code 0 (0x0) 2012-09-06T08:46:27.507006+00:00 hatest kernel: block drbd5: conn( WFSyncUUID -> SyncTarget ) 2012-09-06T08:46:27.507140+00:00 hatest kernel: block drbd6: aftr_isp( 0 -> 1 ) 2012-09-06T08:46:27.507243+00:00 hatest kernel: block drbd5: Began resync as SyncTarget (will sync 39936 KB [9984 bits set]). 2012-09-06T08:46:35.522342+00:00 hatest kernel: block drbd5: Resync done (total 7 sec; paused 0 sec; 5704 K/sec) 2012-09-06T08:46:35.528636+00:00 hatest kernel: block drbd5: updated UUIDs 8650ED8A7A065A7A:0000000000000000:72990EB66AD1359C:72980EB66AD1359D 2012-09-06T08:46:35.528822+00:00 hatest kernel: block drbd5: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) 2012-09-06T08:46:35.528916+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm after-resync-target minor-5 2012-09-06T08:46:35.682465+00:00 hatest kernel: block drbd6: peer_isp( 1 -> 0 ) 2012-09-06T08:46:35.721770+00:00 hatest kernel: block drbd6: aftr_isp( 1 -> 0 ) 2012-09-06T08:46:35.721890+00:00 hatest kernel: block drbd7: aftr_isp( 1 -> 0 ) 2012-09-06T08:46:35.811816+00:00 hatest kernel: block drbd5: helper command: /sbin/drbdadm after-resync-target minor-5 exit code 0 (0x0) (...) How to set up the sync rate controller or a fixed resync-rate? Regards, Lucas