Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Martyn.. To fix connectivity issues with DRBD open up 2 ssh sessions to both nodes on one SSH session for each node run the following command watch cat /proc/drbd this will allow you to monitor the status of the nodes as they attempt to reconnect on the node that states that it is secondary (it should have something like:) 0:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown and primary should look like this 0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown r--- if you are using heartbeat to control your drbd you should stop it (you can use the resource name here if you are running more than one DRBD device and only one is broken) on both nodes type: drbdadm down all drbdadm up all both nodes will probably report that they are in a secondary state now make one primary (the one that you believe is the latest or the one that previously reported that it was primary) drbdadm primary all and then on both nodes drbdadm connect all if that does not work you will have to outdate the secondary node on secondary: drbdadm outdate all and then try the connection again on both nodes drbdadm connect all if this does not work you should invalidate the secondary node and retry the connection if at this point you are unable to get the nodes to talk to each other check for a split brained situation. run dmesg |grep drbd and have a look along the last few lines for drbd0: Split-Brain detected, dropping connection! if this is there you will have to sacrafice data on one of the nodes choose the node that you feel is incorrect (if you followed the above it is your secondary node) and run drbdadm -- --discard-my-data connect all and on the primary drbdadm connect all drbdadm primary all and you should see that both nodes connect and are syncing again if you are using heartbeat you will have to get the cluster back into its correct config on both nodes drbdadm down all service drbd stop service heartbeat start drbd will be stopped and restarted by heartbeat, it will take some time to restart heartbeat depending on your timeout settings, but once it comes back up you should see data from within your watch cat /proc/drbd window stating that one node has gone primary and is in sync the following will make the current DRBD system secondary and ditch split brain Data in one go (remote has to be added to the host file and a passwordless login should be set up before doing this) drbdadm -- --discard-my-data connect storage ssh remote "drbdadm connect all" you can also add the following to your drbd resource config for automated split brain recovery resource <resource> { handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh root"; ... } net { after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; ... } ... } it should now be possible to use drbdmanage to do this for you drbdmanage net-options --resource storage --after-sb-0pri discard-zero-changes --after-sb-1pri discard-secondary --after-sb-2pri disconnect drbdmanage handlers --resource storage --split-brain /usr/lib/drbd/notify-split-brain.sh Once you have confirmed that the data is valid you can scrub the drbdmanage configuration with the drbdmanage uninit command, please ensure that you have enough valid nodes in your drbdmanage cluster to have quorum and to allow the services to start, I use the following to quickly blow away the local configuration from a node Scrub DRBD Configuration from a node On the broken node: drbdadm down all drbdadm down .drbdctrl drbdmanage uninit vgremove drbdpool # if you get an error here please reboot the server or check pvscan for additional volumes mapped by lvmonitor incorrectly vgcreate drbdpool /dev/sdb On the working node drbdmanage rn nodename.domain.name --force drbdmanage an nodename.domain.name 10.x.x.x Jay On 2 October 2017 at 11:37, Martyn Spencer <msdreg_linbit at microdata.co.uk> wrote: > I am testing a three node DRBD 9.0.9 setup using packages I built for > CentOS7. I am using the latest drbdmanage and drbd-utils versions. If I lose > the data on the resources, it is fine (I am only testing) but I was wanting > to learn how to manage (if possible) the mess that I have just caused :) > > Two nodes were working fine; let's call them node1 and node2. > > When I attempted to add node3, without storage, it failed. This is something > I will worry about later. > > I managed to put node1 into a state where it had pending actions that I > could not remove, so decided to remove the node and then re-add it. Rather > naively I did not check and the DRBD resources were all role:primary on > node1. Now node1 is in a state "pending: remove" and I cannot in any way > seem to add it back to the cluster. If I use list-assignments, I can see > that the resources all have pending actions "decommission" against node1. I > am quite clear that DRBD is doing exactly what I asked it to do, and it also > looks as though it is protecting me from my own mistakes somewhat (since the > underyling DRBD resources appear to be OK). > > I would like to ensure that the data that is in the resources on node1 is > synchronised with node2 before doing anything else. At present, all the > node1 resources are showing as "UpToDate" and "connecting" and the node2 > resources are showing as "Outdated" and they are not attempting to reconnect > to node1. > > Is there a way to force them to connect to node1 to resynchronise before I > continue? > > Many thanks, > > Martyn Spencer > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -- "The only difference between saints and sinners is that every saint has a past while every sinner has a future. " — Oscar Wilde