[DRBD-user] Antwort: Re: question about start drbd on single node after a power outage

Wed Feb 1 19:58:05 CET 2012

Thanks Robert and Kaloyan.
I use same command as Robert mentioned when solve the problem manually.

I tried split brain auto resolve settings.
Seems too risky for some scenario, will leave it as for now, 
do a manually resolve if needed instead of some risky auto process.

Thanks a lot, Kaloyan, your info is very helpful.

-----Original Message-----
From: Kaloyan Kovachev [mailto:kkovachev at varna.net] 
Sent: January-31-12 11:12 AM
To: Robert.Koeppl at knapp.com
Cc: Xing, Steven; drbd-user at lists.linbit.com; Digimer
Subject: Re: Antwort: Re: [DRBD-user] question about start drbd on single node after a power outage

On Tue, 31 Jan 2012 16:53:55 +0100, Robert.Koeppl at knapp.com wrote:
> Hi!
> You can force the node into primary by issuing "drbdadm -- 
> --overwrite-data-of-peer primary all". This is quite selfexplanatory 
> regardingits consequences when the other node comes up again.
> if you issue a "drbdadm disconnect all" after that you make sure it 
> does

> not overwrite the other node when it comes up, so you can do some
recovery 
> on the other node in case you are missing some data and are lucky. 
> "drbdadm connect all" later on should reestablish the connection later
on.

You may also use 'invalidate-remote' ... keep in mind full sync will take place. Taking a snapshot on 'before-resync-target' may help restoring some data without disconnecting.
As you are absolutely sure that's what you want - for autorecovery use:
after-sb-opri discard-younger-primary
after-sb-1pri discard-secondary
after-sb-2pri violently-as0p
rrconflict    violently

> Mit freundlichen Grüßen / Best Regards
> 
> Robert Köppl
> 
> Customer Support & Projects
> Teamleader IT Support
> 
> KNAPP Systemintegration GmbH
> Waltenbachstraße 9
> 8700 Leoben, Austria
> Phone: +43 3842 805-322
> Fax: +43 3842 82930-500
> robert.koeppl at knapp.com
> www.KNAPP.com
> 
> Commercial register number: FN 138870x Commercial register court: 
> Leoben
> 
> The information in this e-mail (including any attachment) is
confidential 
> and intended to be for the use of the addressee(s) only. If you have 
> received the e-mail by mistake, any disclosure, copy, distribution or
use 
> of the contents of the e-mail is prohibited, and you must delete the 
> e-mail from your system. As e-mail can be changed electronically KNAPP 
> assumes no responsibility for any alteration to this e-mail or its 
> attachments. KNAPP has taken every reasonable precaution to ensure 
> that any attachment to this e-mail has been swept for virus. However, 
> KNAPP does not accept any liability for damage sustained as a result 
> of such attachment being virus infected and strongly recommend that 
> you carry
out 
> your own virus check before opening any attachment.
> 
> 
> 
> "Xing, Steven" <SXing at BroadViewNet.com> Gesendet von: 
> drbd-user-bounces at lists.linbit.com
> 31.01.2012 16:46
> 
> An
> "Kaloyan Kovachev" <kkovachev at varna.net>, "Digimer" 
> <linux at alteeve.com> Kopie drbd-user at lists.linbit.com Thema
> Re: [DRBD-user] question about start drbd on single node after a power 
> outage
> 
> 
> 
> 
> 
> 
> Your response both faster, ;)
> Thanks all. I understand that there is a high risk of split brain and
data 
> loss if force one node boot up.
> But in my case, service recover is much more important than data loss.
> 
> I have 2 node pacemaker cluster, drbd running as master/slave(controlled

> by pacemaker).
> When power outage on both nodes(just like unplug the power cable from
both 
> node at the same time)
> Seems no chance for drbd to do anything(such as fence the peer), thus, 
> after I just boot up the previous primary node, 
> I saw the pacemaker trying to promote the drbd to master, but failed, 
> since if I run drbdadm status, it shows:
> 
> <drbd-status version="8.3.11" api="88">
> <resources config_file="/etc/drbd.conf">
> <resource minor="0" name="vksshared" cs="WFConnection" ro1="Secondary" 
> ro2="Unknown" ds1="Consistent" ds2="DUnknown" />
> </resources>
> </drbd-status>
> 
> I tried set both of the wfc time out to 5sec, that not work.
> 
> As you can see the service drbd started, but can not be promote, since 
> ds1="Consistent", only "UpToDate" will work.
> Only when I boot up the other node, just after the 2 drbd instance 
> connected, drbd can declare disk status as "UpToDate".
> 
> If not boot up the other node, I have not find an automatic way no
matter 
> it is safe or not to force it think itself is "UpToDate".
> Seems the drbd did not remember its previous running status(primary or 
> slave) for some safe reason.
> 
> Do you have any idea/comments on this? I looked into the doc, could not 
> find any setting can make this done even not safe. 
> If I upgrade drbd to the latest version, will it help?
> 
> 
> -----Original Message-----
> From: Kaloyan Kovachev [mailto:kkovachev at varna.net] 
> Sent: January-31-12 9:35 AM
> To: Digimer
> Cc: Xing, Steven; drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] question about start drbd on single node after
a 
> power outage
> 
> You were faster than me :)
> 
> On Tue, 31 Jan 2012 09:04:49 -0500, Digimer <linux at alteeve.com> wrote:
>> 
>> If you want to force the issue though, you can use 'wfc-timeout 300'
>> which will tell DRBD to wait up to 5 minutes for it's peer. After that 
>> time, consider itself primary. Please don't use this though until 
>> you've exhausted all other ways of starting safely.
> 
> There are two (well documented) options in drbd.conf - wfc-timeout and 
> degr-wfc-timeout. To avoid split-brain i set both to 0.
> 
> If you need to skip waiting you can manually do this from the console in

> case you start drbd standalone or before cman / pacemaker.
> 
> In my case it is exported via iSCSI (not as cluster resource), so have 
> additional wait loop for both nodes to became UpToDate for _all_ 
> configured resources before exporting any of them - 'no data' is better 
> than 'broken data' - yes i have been bitten from the last one (luckily 
> during the preproduction phase) and believe me you don't wan't that on 
> production nodes (unless you have static read-only data)
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user