[DRBD-user] Cancelling pending actions

Martyn Spencer mdsreg_linbit at microdata.co.uk
Wed Oct 4 12:37:39 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Jay,

Thank you for your very detailed notes - they are very helpful. Out of 
interest, is using cat /proc/drbd still useful with drbd 9? Would 
watching drbdsetup status be the preferred equivalent now?

Many thanks,

Martyn

On 03/10/17 10:02, Jason Fitzpatrick wrote:
> Hi Martyn..
>
> To fix connectivity issues with DRBD
>
> open up 2 ssh sessions to both nodes
>
> on one SSH session for each node run the following command
>
> watch cat /proc/drbd
> this will allow you to monitor the status of the nodes as they attempt
> to reconnect
>
> on the node that states that it is secondary (it should have something like:)
>
> 0:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown
> and primary should look like this
>
> 0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown   r---
>
> if you are using heartbeat to control your drbd you should stop it
>
>
> (you can use the resource name here if you are running more than one
> DRBD device and only one is broken)
>
> on both nodes type:
>
> drbdadm down all
> drbdadm up all
> both nodes will probably report that they are in a secondary state now
> make one primary (the one that you believe is the latest or the one
> that previously reported that it was primary)
>
> drbdadm primary all
> and then on both nodes
>
> drbdadm connect all
> if that does not work you will have to outdate the secondary node
>
> on secondary:
>
> drbdadm outdate all
> and then try the connection again on both nodes
>
> drbdadm connect all
> if this does not work you should invalidate the secondary node and
> retry the connection
>
> if at this point you are unable to get the nodes to talk to each other
> check for a split brained situation.
>
> run
>
> dmesg |grep drbd
> and have a look along the last few lines for
>
> drbd0: Split-Brain detected, dropping connection!
>
> if this is there you will have to sacrafice data on one of the nodes
>
> choose the node that you feel is incorrect (if you followed the above
> it is your secondary node)
>
> and run
>
> drbdadm -- --discard-my-data connect all
>
>
> and on the primary
>
> drbdadm connect all
> drbdadm primary all
>
> and you should see that both nodes connect and are syncing again
>
>
> if you are using heartbeat you will have to get the cluster back into
> its correct config
>
> on both nodes
>
> drbdadm down all
> service drbd stop
> service heartbeat start
> drbd will be stopped and restarted by heartbeat, it will take some
> time to restart heartbeat depending on your timeout settings, but once
> it comes back up you should see data from within your watch cat
> /proc/drbd window stating that one node has gone primary and is in
> sync
>
>
> the following will make the current DRBD system secondary and ditch
> split brain Data in one go (remote has to be added to the host file
> and a passwordless login should be set up before doing this)
>
>
> drbdadm -- --discard-my-data connect storage
> ssh remote "drbdadm connect all"
>
> you can also add the following to your drbd resource config for
> automated split brain recovery
>
> resource <resource> {
>   handlers {
>     split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>     ...
>   }
>   net {
>     after-sb-0pri discard-zero-changes;
>     after-sb-1pri discard-secondary;
>     after-sb-2pri disconnect;
>     ...
>   }
>   ...
> }
>
> it should now be possible to use drbdmanage to do this for you
>
> drbdmanage net-options --resource storage --after-sb-0pri
> discard-zero-changes --after-sb-1pri discard-secondary --after-sb-2pri
> disconnect
> drbdmanage handlers --resource storage --split-brain
> /usr/lib/drbd/notify-split-brain.sh
>
>
> Once you have confirmed that the data is valid you can scrub the
> drbdmanage configuration with the drbdmanage uninit command, please
> ensure that you have enough valid nodes in your drbdmanage cluster to
> have quorum and to allow the services to start,
>
> I use the following to quickly blow away the local configuration from a node
>
> Scrub DRBD Configuration from a node
> On the broken node:
>
> drbdadm down all
> drbdadm down .drbdctrl
> drbdmanage uninit
> vgremove drbdpool # if you get an error here please reboot the server
> or check pvscan for additional volumes mapped by lvmonitor incorrectly
> vgcreate drbdpool /dev/sdb
>
>
> On the working node
>
> drbdmanage rn nodename.domain.name --force
> drbdmanage an nodename.domain.name 10.x.x.x
>
> Jay
>
> On 2 October 2017 at 11:37, Martyn Spencer
> <msdreg_linbit at microdata.co.uk> wrote:
>> I am testing a three node DRBD 9.0.9 setup using packages I built for
>> CentOS7. I am using the latest drbdmanage and drbd-utils versions. If I lose
>> the data on the resources, it is fine (I am only testing) but I was wanting
>> to learn how to manage (if possible) the mess that I have just caused :)
>>
>> Two nodes were working fine; let's call them node1 and node2.
>>
>> When I attempted to add node3, without storage, it failed. This is something
>> I will worry about later.
>>
>> I managed to put node1 into a state where it had pending actions that I
>> could not remove, so decided to remove the node and then re-add it. Rather
>> naively I did not check and the DRBD resources were all role:primary on
>> node1. Now node1 is in a state "pending: remove" and I cannot in any way
>> seem to add it back to the cluster. If I use list-assignments, I can see
>> that the resources all have pending actions "decommission" against node1. I
>> am quite clear that DRBD is doing exactly what I asked it to do, and it also
>> looks as though it is protecting me from my own mistakes somewhat (since the
>> underyling DRBD resources appear to be OK).
>>
>> I would like to ensure that the data that is in the resources on node1 is
>> synchronised with node2 before doing anything else. At present, all the
>> node1 resources are showing as "UpToDate" and "connecting" and the node2
>> resources are showing as "Outdated" and they are not attempting to reconnect
>> to node1.
>>
>> Is there a way to force them to connect to node1 to resynchronise before I
>> continue?
>>
>> Many thanks,
>>
>> Martyn Spencer
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>




More information about the drbd-user mailing list