Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars, note that the following issue description is from a DRBD setup that is well outside of design specs. This is from an IRC discussion with Paul Hedderly (CC'd, "prh" on freenode). Debian sid box running 8.3.9 userland against a stock 3.1 kernel drbd.ko (equiv. DRBD 8.3.11), with protocol A over a DSL link, where the user has said he doesn't want to afford DRBD Proxy. So it's pretty much a symphony of "don't do that", but I'm just posting the issue here in case it's an underlying deeper problem that you want to investigate. Please, by all means, feel free to pass on this if you're busy. This is the DRBD config for his misbehaving device, /dev/drbd12: disk { size 0s _is_default; # bytes on-io-error detach; fencing dont-care _is_default; max-bio-bvecs 0 _is_default; } net { timeout 150; # 1/10 seconds max-epoch-size 2048 _is_default; max-buffers 1024; unplug-watermark 128 _is_default; connect-int 20; # seconds ping-int 20; # seconds sndbuf-size 0 _is_default; # bytes rcvbuf-size 0 _is_default; # bytes ko-count 0 _is_default; after-sb-0pri discard-least-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect _is_default; rr-conflict disconnect _is_default; ping-timeout 20; # 1/10 seconds } syncer { rate 250k _is_default; # bytes/second after 11; al-extents 127 _is_default; csums-alg "sha1"; verify-alg "sha1"; on-no-data-accessible io-error _is_default; c-plan-ahead 30; # 1/10 seconds c-delay-target 10 _is_default; # 1/10 seconds c-fill-target 32s; # bytes c-max-rate 800k; # bytes/second c-min-rate 1024k; # bytes/second } protocol A; _this_host { device minor 12; disk "/dev/mew88/drbd_x2"; flexible-meta-disk "/dev/mew88/dmeta_x2"; address ipv4 10.88.8.167:2782; } _remote_host { address ipv4 10.88.2.2:7782; } # (81) unknown tag = (integer) 0 [len: 4] # (82) unknown tag = (integer) 0 [len: 4] # (83) unknown tag = (integer) 127 [len: 4] # Found unknown tags, you should update your # userland tools Now, what he is reporting is that anytime that device is being disconnected, it does a full sync. And judging by the logs that seems to indeed be true: Nov 18 04:52:39 mew88 kernel: [167671.867930] block drbd12: drbd_sync_handshake: Nov 18 04:52:39 mew88 kernel: [167671.867945] block drbd12: self 002A000000000000:0000000000000000:0000000000000000:0000000000000000 bits:1888384 flags:0 Nov 18 04:52:39 mew88 kernel: [167671.867959] block drbd12: peer 0000000000000005:002A000000000000:0029000000000000:0028000000000000 bits:1888584 flags:2 Nov 18 04:52:39 mew88 kernel: [167671.867971] block drbd12: uuid_compare()=2 by rule 30 Nov 18 04:52:39 mew88 kernel: [167671.867978] block drbd12: Becoming sync target due to disk states. In the above the sync decision, AFAICT, is fine based on the UUIDs. The sync then actually does complete, hours later, which again is expected based on the amazingly slow link: Nov 18 04:52:41 mew88 kernel: [167673.294137] block drbd12: Began resync as SyncTarget (will sync 20971520 KB [5242880 bits set]). Nov 18 11:21:52 mew88 kernel: [191024.287840] block drbd12: Resync done (total 23350 sec; paused 0 sec; 896 K/sec) But then, immediately after that, this: Nov 18 11:21:52 mew88 kernel: [191024.287856] block drbd12: 65 % had equal checksums, eliminated: 13698048K; transferred 7273472K total 20971520K Nov 18 11:21:52 mew88 kernel: [191024.287875] block drbd12: updated UUIDs 0000000000000004:0000000000000000:002B000000000000:002A000000000000 So if I understand correctly, it's updating the current UUID with UUID_JUST_CREATED. Hrm. The link then breaks again, a few hours later: Nov 18 15:06:29 mew88 kernel: [204501.759578] block drbd12: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) Nov 18 15:06:29 mew88 kernel: [204501.770208] block drbd12: meta connection shut down by peer. And when it is reinstated a couple of minutes later, sure enough, rule 20 triggers again: Nov 18 15:08:50 mew88 kernel: [204642.358228] block drbd12: drbd_sync_handshake: Nov 18 15:08:50 mew88 kernel: [204642.358243] block drbd12: self 0000000000000004:0000000000000000:002B000000000000:002A000000000000 bits:0 flags:0 Nov 18 15:08:50 mew88 kernel: [204642.358257] block drbd12: peer 38A8B488B7060CB3:0000000000000005:002B000000000000:002A000000000000 bits:573 flags:0 Nov 18 15:08:50 mew88 kernel: [204642.358269] block drbd12: uuid_compare()=-2 by rule 20 Nov 18 15:08:50 mew88 kernel: [204642.358277] block drbd12: Writing the whole bitmap, full sync required after drbd_sync_handshake. Da capo al fine. Is there a plausible explanation for that odd current UUID? I have a full kernel log if that is helpful. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now