Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On May 23, 2012, at 3:47 PM, Florian Haas wrote: > On Wed, May 23, 2012 at 10:34 PM, Zev Weiss <zweiss at scout.wisc.edu> wrote: >> >> On May 23, 2012, at 3:22 PM, Florian Haas wrote: >> >>> On Wed, May 23, 2012 at 10:14 PM, Zev Weiss <zweiss at scout.wisc.edu> wrote: >>>> Hi, >>>> >>>> I'm running DRBD 8.3.12, and recently hit what looks to me like a bug that was listed as fixed in 8.3.13 -- getting into a state where both nodes are in SyncSource (it's just stuck like that, going nowhere). Luckily this happened on a test resource and not a live one, so it's not a big problem, but I was wondering if there were any known ways of recovering it without doing anything disruptive to the other resources (e.g. rebooting or unloading the kernel module). >>>> >>>> I've tried 'drbdadm down', but it just hangs -- anyone have any other suggestions? It doesn't really matter to me if it wipes the resource or anything, I'd just like to have my test device back in a working state without disturbing anything else. >>> >>> Can you post /proc/drbd contents from both nodes here? >>> >> >> Sure -- here's one node: >> >> version: 8.3.12 (api:88/proto:86-96) >> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by zweiss at mydomain, 2012-03-14 19:52:38 >> >> <snip other resources> >> 9: cs:SyncSource ro:Secondary/Primary ds:UpToDate/Inconsistent C r----- >> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536 >> [>...................] sync'ed: 5.9% (65536/65536)K >> finish: 19046:04:53 speed: 0 (0 -- 0) K/sec (stalled) >> 0% sector pos: 0/10698352 >> resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0 >> act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0 >> >> >> And here's the other: >> >> version: 8.3.12 (api:88/proto:86-96) >> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by zweiss at fromage.scout.wisc.edu, 2012-03-14 19:52:38 >> >> <snip other resources> >> 9: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r----- >> ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536 >> [>...................] sync'ed: 5.9% (65536/65536)K >> finish: 18987:55:05 speed: 0 (0 -- 0) K/sec (stalled) >> 0% sector pos: 0/10698352 >> resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0 >> act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0 > > Ugh. Can you force the device into the WFConnection state by injecting > a couple of iptables rules blocking the replication port, and then > "down" the resource? I've now inserted iptables rules on both sides for the relevant replication port (reject-with icmp-port-unreachable), but no transition to WFConnection -- it's still stuck in the same state (and unsurprisingly, 'down' still just hangs). Zev