[DRBD-user] Cancelling pending actions

Jason Fitzpatrick jayfitzpatrick at gmail.com
Wed Oct 4 13:33:23 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


personally I use drbdadm status, and the output seems the same,

The instructions I provided were from an older version of DRBD which I
have not fully updated yet

On 4 October 2017 at 11:37, Martyn Spencer
<mdsreg_linbit at microdata.co.uk> wrote:
> Hi Jay,
>
> Thank you for your very detailed notes - they are very helpful. Out of
> interest, is using cat /proc/drbd still useful with drbd 9? Would watching
> drbdsetup status be the preferred equivalent now?
>
> Many thanks,
>
> Martyn
>
>
> On 03/10/17 10:02, Jason Fitzpatrick wrote:
>>
>> Hi Martyn..
>>
>> To fix connectivity issues with DRBD
>>
>> open up 2 ssh sessions to both nodes
>>
>> on one SSH session for each node run the following command
>>
>> watch cat /proc/drbd
>> this will allow you to monitor the status of the nodes as they attempt
>> to reconnect
>>
>> on the node that states that it is secondary (it should have something
>> like:)
>>
>> 0:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown
>> and primary should look like this
>>
>> 0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown   r---
>>
>> if you are using heartbeat to control your drbd you should stop it
>>
>>
>> (you can use the resource name here if you are running more than one
>> DRBD device and only one is broken)
>>
>> on both nodes type:
>>
>> drbdadm down all
>> drbdadm up all
>> both nodes will probably report that they are in a secondary state now
>> make one primary (the one that you believe is the latest or the one
>> that previously reported that it was primary)
>>
>> drbdadm primary all
>> and then on both nodes
>>
>> drbdadm connect all
>> if that does not work you will have to outdate the secondary node
>>
>> on secondary:
>>
>> drbdadm outdate all
>> and then try the connection again on both nodes
>>
>> drbdadm connect all
>> if this does not work you should invalidate the secondary node and
>> retry the connection
>>
>> if at this point you are unable to get the nodes to talk to each other
>> check for a split brained situation.
>>
>> run
>>
>> dmesg |grep drbd
>> and have a look along the last few lines for
>>
>> drbd0: Split-Brain detected, dropping connection!
>>
>> if this is there you will have to sacrafice data on one of the nodes
>>
>> choose the node that you feel is incorrect (if you followed the above
>> it is your secondary node)
>>
>> and run
>>
>> drbdadm -- --discard-my-data connect all
>>
>>
>> and on the primary
>>
>> drbdadm connect all
>> drbdadm primary all
>>
>> and you should see that both nodes connect and are syncing again
>>
>>
>> if you are using heartbeat you will have to get the cluster back into
>> its correct config
>>
>> on both nodes
>>
>> drbdadm down all
>> service drbd stop
>> service heartbeat start
>> drbd will be stopped and restarted by heartbeat, it will take some
>> time to restart heartbeat depending on your timeout settings, but once
>> it comes back up you should see data from within your watch cat
>> /proc/drbd window stating that one node has gone primary and is in
>> sync
>>
>>
>> the following will make the current DRBD system secondary and ditch
>> split brain Data in one go (remote has to be added to the host file
>> and a passwordless login should be set up before doing this)
>>
>>
>> drbdadm -- --discard-my-data connect storage
>> ssh remote "drbdadm connect all"
>>
>> you can also add the following to your drbd resource config for
>> automated split brain recovery
>>
>> resource <resource> {
>>   handlers {
>>     split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>>     ...
>>   }
>>   net {
>>     after-sb-0pri discard-zero-changes;
>>     after-sb-1pri discard-secondary;
>>     after-sb-2pri disconnect;
>>     ...
>>   }
>>   ...
>> }
>>
>> it should now be possible to use drbdmanage to do this for you
>>
>> drbdmanage net-options --resource storage --after-sb-0pri
>> discard-zero-changes --after-sb-1pri discard-secondary --after-sb-2pri
>> disconnect
>> drbdmanage handlers --resource storage --split-brain
>> /usr/lib/drbd/notify-split-brain.sh
>>
>>
>> Once you have confirmed that the data is valid you can scrub the
>> drbdmanage configuration with the drbdmanage uninit command, please
>> ensure that you have enough valid nodes in your drbdmanage cluster to
>> have quorum and to allow the services to start,
>>
>> I use the following to quickly blow away the local configuration from a
>> node
>>
>> Scrub DRBD Configuration from a node
>> On the broken node:
>>
>> drbdadm down all
>> drbdadm down .drbdctrl
>> drbdmanage uninit
>> vgremove drbdpool # if you get an error here please reboot the server
>> or check pvscan for additional volumes mapped by lvmonitor incorrectly
>> vgcreate drbdpool /dev/sdb
>>
>>
>> On the working node
>>
>> drbdmanage rn nodename.domain.name --force
>> drbdmanage an nodename.domain.name 10.x.x.x
>>
>> Jay
>>
>> On 2 October 2017 at 11:37, Martyn Spencer
>> <msdreg_linbit at microdata.co.uk> wrote:
>>>
>>> I am testing a three node DRBD 9.0.9 setup using packages I built for
>>> CentOS7. I am using the latest drbdmanage and drbd-utils versions. If I
>>> lose
>>> the data on the resources, it is fine (I am only testing) but I was
>>> wanting
>>> to learn how to manage (if possible) the mess that I have just caused :)
>>>
>>> Two nodes were working fine; let's call them node1 and node2.
>>>
>>> When I attempted to add node3, without storage, it failed. This is
>>> something
>>> I will worry about later.
>>>
>>> I managed to put node1 into a state where it had pending actions that I
>>> could not remove, so decided to remove the node and then re-add it.
>>> Rather
>>> naively I did not check and the DRBD resources were all role:primary on
>>> node1. Now node1 is in a state "pending: remove" and I cannot in any way
>>> seem to add it back to the cluster. If I use list-assignments, I can see
>>> that the resources all have pending actions "decommission" against node1.
>>> I
>>> am quite clear that DRBD is doing exactly what I asked it to do, and it
>>> also
>>> looks as though it is protecting me from my own mistakes somewhat (since
>>> the
>>> underyling DRBD resources appear to be OK).
>>>
>>> I would like to ensure that the data that is in the resources on node1 is
>>> synchronised with node2 before doing anything else. At present, all the
>>> node1 resources are showing as "UpToDate" and "connecting" and the node2
>>> resources are showing as "Outdated" and they are not attempting to
>>> reconnect
>>> to node1.
>>>
>>> Is there a way to force them to connect to node1 to resynchronise before
>>> I
>>> continue?
>>>
>>> Many thanks,
>>>
>>> Martyn Spencer
>>>
>>> _______________________________________________
>>> drbd-user mailing list
>>> drbd-user at lists.linbit.com
>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>>
>>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user



-- 

"The only difference between saints and sinners is that every saint
has a past while every sinner has a future. "
— Oscar Wilde



More information about the drbd-user mailing list