Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
personally I use drbdadm status, and the output seems the same, The instructions I provided were from an older version of DRBD which I have not fully updated yet On 4 October 2017 at 11:37, Martyn Spencer <mdsreg_linbit at microdata.co.uk> wrote: > Hi Jay, > > Thank you for your very detailed notes - they are very helpful. Out of > interest, is using cat /proc/drbd still useful with drbd 9? Would watching > drbdsetup status be the preferred equivalent now? > > Many thanks, > > Martyn > > > On 03/10/17 10:02, Jason Fitzpatrick wrote: >> >> Hi Martyn.. >> >> To fix connectivity issues with DRBD >> >> open up 2 ssh sessions to both nodes >> >> on one SSH session for each node run the following command >> >> watch cat /proc/drbd >> this will allow you to monitor the status of the nodes as they attempt >> to reconnect >> >> on the node that states that it is secondary (it should have something >> like:) >> >> 0:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown >> and primary should look like this >> >> 0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown r--- >> >> if you are using heartbeat to control your drbd you should stop it >> >> >> (you can use the resource name here if you are running more than one >> DRBD device and only one is broken) >> >> on both nodes type: >> >> drbdadm down all >> drbdadm up all >> both nodes will probably report that they are in a secondary state now >> make one primary (the one that you believe is the latest or the one >> that previously reported that it was primary) >> >> drbdadm primary all >> and then on both nodes >> >> drbdadm connect all >> if that does not work you will have to outdate the secondary node >> >> on secondary: >> >> drbdadm outdate all >> and then try the connection again on both nodes >> >> drbdadm connect all >> if this does not work you should invalidate the secondary node and >> retry the connection >> >> if at this point you are unable to get the nodes to talk to each other >> check for a split brained situation. >> >> run >> >> dmesg |grep drbd >> and have a look along the last few lines for >> >> drbd0: Split-Brain detected, dropping connection! >> >> if this is there you will have to sacrafice data on one of the nodes >> >> choose the node that you feel is incorrect (if you followed the above >> it is your secondary node) >> >> and run >> >> drbdadm -- --discard-my-data connect all >> >> >> and on the primary >> >> drbdadm connect all >> drbdadm primary all >> >> and you should see that both nodes connect and are syncing again >> >> >> if you are using heartbeat you will have to get the cluster back into >> its correct config >> >> on both nodes >> >> drbdadm down all >> service drbd stop >> service heartbeat start >> drbd will be stopped and restarted by heartbeat, it will take some >> time to restart heartbeat depending on your timeout settings, but once >> it comes back up you should see data from within your watch cat >> /proc/drbd window stating that one node has gone primary and is in >> sync >> >> >> the following will make the current DRBD system secondary and ditch >> split brain Data in one go (remote has to be added to the host file >> and a passwordless login should be set up before doing this) >> >> >> drbdadm -- --discard-my-data connect storage >> ssh remote "drbdadm connect all" >> >> you can also add the following to your drbd resource config for >> automated split brain recovery >> >> resource <resource> { >> handlers { >> split-brain "/usr/lib/drbd/notify-split-brain.sh root"; >> ... >> } >> net { >> after-sb-0pri discard-zero-changes; >> after-sb-1pri discard-secondary; >> after-sb-2pri disconnect; >> ... >> } >> ... >> } >> >> it should now be possible to use drbdmanage to do this for you >> >> drbdmanage net-options --resource storage --after-sb-0pri >> discard-zero-changes --after-sb-1pri discard-secondary --after-sb-2pri >> disconnect >> drbdmanage handlers --resource storage --split-brain >> /usr/lib/drbd/notify-split-brain.sh >> >> >> Once you have confirmed that the data is valid you can scrub the >> drbdmanage configuration with the drbdmanage uninit command, please >> ensure that you have enough valid nodes in your drbdmanage cluster to >> have quorum and to allow the services to start, >> >> I use the following to quickly blow away the local configuration from a >> node >> >> Scrub DRBD Configuration from a node >> On the broken node: >> >> drbdadm down all >> drbdadm down .drbdctrl >> drbdmanage uninit >> vgremove drbdpool # if you get an error here please reboot the server >> or check pvscan for additional volumes mapped by lvmonitor incorrectly >> vgcreate drbdpool /dev/sdb >> >> >> On the working node >> >> drbdmanage rn nodename.domain.name --force >> drbdmanage an nodename.domain.name 10.x.x.x >> >> Jay >> >> On 2 October 2017 at 11:37, Martyn Spencer >> <msdreg_linbit at microdata.co.uk> wrote: >>> >>> I am testing a three node DRBD 9.0.9 setup using packages I built for >>> CentOS7. I am using the latest drbdmanage and drbd-utils versions. If I >>> lose >>> the data on the resources, it is fine (I am only testing) but I was >>> wanting >>> to learn how to manage (if possible) the mess that I have just caused :) >>> >>> Two nodes were working fine; let's call them node1 and node2. >>> >>> When I attempted to add node3, without storage, it failed. This is >>> something >>> I will worry about later. >>> >>> I managed to put node1 into a state where it had pending actions that I >>> could not remove, so decided to remove the node and then re-add it. >>> Rather >>> naively I did not check and the DRBD resources were all role:primary on >>> node1. Now node1 is in a state "pending: remove" and I cannot in any way >>> seem to add it back to the cluster. If I use list-assignments, I can see >>> that the resources all have pending actions "decommission" against node1. >>> I >>> am quite clear that DRBD is doing exactly what I asked it to do, and it >>> also >>> looks as though it is protecting me from my own mistakes somewhat (since >>> the >>> underyling DRBD resources appear to be OK). >>> >>> I would like to ensure that the data that is in the resources on node1 is >>> synchronised with node2 before doing anything else. At present, all the >>> node1 resources are showing as "UpToDate" and "connecting" and the node2 >>> resources are showing as "Outdated" and they are not attempting to >>> reconnect >>> to node1. >>> >>> Is there a way to force them to connect to node1 to resynchronise before >>> I >>> continue? >>> >>> Many thanks, >>> >>> Martyn Spencer >>> >>> _______________________________________________ >>> drbd-user mailing list >>> drbd-user at lists.linbit.com >>> http://lists.linbit.com/mailman/listinfo/drbd-user >> >> >> > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -- "The only difference between saints and sinners is that every saint has a past while every sinner has a future. " — Oscar Wilde