Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, May 23, 2012 at 04:34:28PM -0500, Zev Weiss wrote: > >>>>>> I'm running DRBD 8.3.12, and recently hit what looks to me like > >>>>>> a bug that was listed as fixed in 8.3.13 -- getting into a > >>>>>> state where both nodes are in SyncSource (it's just stuck like > >>>>>> that, going nowhere). Luckily this happened on a test resource > >>>>>> and not a live one, so it's not a big problem, but I was > >>>>>> wondering if there were any known ways of recovering it without > >>>>>> doing anything disruptive to the other resources (e.g. > >>>>>> rebooting or unloading the kernel module). > >>>>>> > >>>>>> I've tried 'drbdadm down', but it just hangs -- anyone have any > >>>>>> other suggestions? It doesn't really matter to me if it wipes > >>>>>> the resource or anything, I'd just like to have my test device > >>>>>> back in a working state without disturbing anything else. > >>>>> > >>>>> Can you post /proc/drbd contents from both nodes here? > >>>>> > >>>> > >>>> Sure -- here's one node: > >>>> > >>>> version: 8.3.12 (api:88/proto:86-96) > >>>> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by zweiss at mydomain, 2012-03-14 19:52:38 > >>>> > >>>> <snip other resources> > >>>> 9: cs:SyncSource ro:Secondary/Primary ds:UpToDate/Inconsistent C r----- > >>>> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536 > >>>> [>...................] sync'ed: 5.9% (65536/65536)K > >>>> finish: 19046:04:53 speed: 0 (0 -- 0) K/sec (stalled) > >>>> 0% sector pos: 0/10698352 > >>>> resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0 > >>>> act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0 > >>> > >>> drbdsetup 9 disconnect --force > >>> may work, > >>> if you did not try a non-forced disconnect or similar before, > >>> that is to say, if the drbd worker thread is not blocked yet. > >>> > >> > >> I think I had tried a non-forced disconnect previously (and perhaps > >> also implicitly as part of a 'down' attempt, though I'm not sure > >> whether it would have gotten to that step if the disconnect operation > >> didn't complete), but 'drbdsetup 9 disconnect --force' also just > >> hangs. > >> > >>> You can always cut the tcp connection using iptables, > >>> which should at least get the worker into a responsive state again. > >>> > >> > >> As mentioned in another message in response to Florian, blocking > >> the replication port via iptables doesn't seem to have had any > >> effect. > > > > You would not need to "block" as in DROP, but to REJECT with tcp reset. > > > > Sorry, should have worded that more carefully -- it wasn't strictly a > DROP, but a REJECT, with icmp-port-unreachable. I've since tweaked it > to reject with tcp-reset instead. No changes in DRBD state on either > side as far as I can see, though both nodes still seem to have (or at > least think they have) a tcp connection or two involving that port, > despite my efforts with iptables: > > [root at node1 ~]# netstat -tn | fgrep 7789 > tcp 156 0 192.168.1.2:37324 192.168.1.1:7789 ESTABLISHED > tcp 0 0 192.168.1.2:7789 192.168.1.1:47548 ESTABLISHED > > [root at node2 ~]# netstat -tn | fgrep 7789 > tcp 0 0 192.168.1.1:47548 192.168.1.2:7789 ESTABLISHED hu? should be two as well? anyways: port=7789; for chain in INPUT OUTPUT ; do for direction in dport sport ; do iptables -I $chain -p tcp --$direction $port -j REJECT --reject-with tcp-reset done done should do it usually. > For what it's worth, node1 is the one that thinks the resource is > Secondary/Secondary, node2 is the one that shows it as > Secondary/Primary. > > > You could also try "ifdown" sleep a while and ifup again. > > (which obviously will impact the other resources, and everything going > > via that interface). > > > > Right, but I don't really want to disrupt replication on all the other > (production) resources, so just leaving it as-is until my next > scheduled maintenance reboot (at which point I plan on would be > preferable. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed