Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Jay, Thank you for your very detailed notes - they are very helpful. Out of interest, is using cat /proc/drbd still useful with drbd 9? Would watching drbdsetup status be the preferred equivalent now? Many thanks, Martyn On 03/10/17 10:02, Jason Fitzpatrick wrote: > Hi Martyn.. > > To fix connectivity issues with DRBD > > open up 2 ssh sessions to both nodes > > on one SSH session for each node run the following command > > watch cat /proc/drbd > this will allow you to monitor the status of the nodes as they attempt > to reconnect > > on the node that states that it is secondary (it should have something like:) > > 0:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown > and primary should look like this > > 0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown r--- > > if you are using heartbeat to control your drbd you should stop it > > > (you can use the resource name here if you are running more than one > DRBD device and only one is broken) > > on both nodes type: > > drbdadm down all > drbdadm up all > both nodes will probably report that they are in a secondary state now > make one primary (the one that you believe is the latest or the one > that previously reported that it was primary) > > drbdadm primary all > and then on both nodes > > drbdadm connect all > if that does not work you will have to outdate the secondary node > > on secondary: > > drbdadm outdate all > and then try the connection again on both nodes > > drbdadm connect all > if this does not work you should invalidate the secondary node and > retry the connection > > if at this point you are unable to get the nodes to talk to each other > check for a split brained situation. > > run > > dmesg |grep drbd > and have a look along the last few lines for > > drbd0: Split-Brain detected, dropping connection! > > if this is there you will have to sacrafice data on one of the nodes > > choose the node that you feel is incorrect (if you followed the above > it is your secondary node) > > and run > > drbdadm -- --discard-my-data connect all > > > and on the primary > > drbdadm connect all > drbdadm primary all > > and you should see that both nodes connect and are syncing again > > > if you are using heartbeat you will have to get the cluster back into > its correct config > > on both nodes > > drbdadm down all > service drbd stop > service heartbeat start > drbd will be stopped and restarted by heartbeat, it will take some > time to restart heartbeat depending on your timeout settings, but once > it comes back up you should see data from within your watch cat > /proc/drbd window stating that one node has gone primary and is in > sync > > > the following will make the current DRBD system secondary and ditch > split brain Data in one go (remote has to be added to the host file > and a passwordless login should be set up before doing this) > > > drbdadm -- --discard-my-data connect storage > ssh remote "drbdadm connect all" > > you can also add the following to your drbd resource config for > automated split brain recovery > > resource <resource> { > handlers { > split-brain "/usr/lib/drbd/notify-split-brain.sh root"; > ... > } > net { > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > ... > } > ... > } > > it should now be possible to use drbdmanage to do this for you > > drbdmanage net-options --resource storage --after-sb-0pri > discard-zero-changes --after-sb-1pri discard-secondary --after-sb-2pri > disconnect > drbdmanage handlers --resource storage --split-brain > /usr/lib/drbd/notify-split-brain.sh > > > Once you have confirmed that the data is valid you can scrub the > drbdmanage configuration with the drbdmanage uninit command, please > ensure that you have enough valid nodes in your drbdmanage cluster to > have quorum and to allow the services to start, > > I use the following to quickly blow away the local configuration from a node > > Scrub DRBD Configuration from a node > On the broken node: > > drbdadm down all > drbdadm down .drbdctrl > drbdmanage uninit > vgremove drbdpool # if you get an error here please reboot the server > or check pvscan for additional volumes mapped by lvmonitor incorrectly > vgcreate drbdpool /dev/sdb > > > On the working node > > drbdmanage rn nodename.domain.name --force > drbdmanage an nodename.domain.name 10.x.x.x > > Jay > > On 2 October 2017 at 11:37, Martyn Spencer > <msdreg_linbit at microdata.co.uk> wrote: >> I am testing a three node DRBD 9.0.9 setup using packages I built for >> CentOS7. I am using the latest drbdmanage and drbd-utils versions. If I lose >> the data on the resources, it is fine (I am only testing) but I was wanting >> to learn how to manage (if possible) the mess that I have just caused :) >> >> Two nodes were working fine; let's call them node1 and node2. >> >> When I attempted to add node3, without storage, it failed. This is something >> I will worry about later. >> >> I managed to put node1 into a state where it had pending actions that I >> could not remove, so decided to remove the node and then re-add it. Rather >> naively I did not check and the DRBD resources were all role:primary on >> node1. Now node1 is in a state "pending: remove" and I cannot in any way >> seem to add it back to the cluster. If I use list-assignments, I can see >> that the resources all have pending actions "decommission" against node1. I >> am quite clear that DRBD is doing exactly what I asked it to do, and it also >> looks as though it is protecting me from my own mistakes somewhat (since the >> underyling DRBD resources appear to be OK). >> >> I would like to ensure that the data that is in the resources on node1 is >> synchronised with node2 before doing anything else. At present, all the >> node1 resources are showing as "UpToDate" and "connecting" and the node2 >> resources are showing as "Outdated" and they are not attempting to reconnect >> to node1. >> >> Is there a way to force them to connect to node1 to resynchronise before I >> continue? >> >> Many thanks, >> >> Martyn Spencer >> >> _______________________________________________ >> drbd-user mailing list >> drbd-user at lists.linbit.com >> http://lists.linbit.com/mailman/listinfo/drbd-user > >