[DRBD-user] Antwort: Re: question about start drbd on single node after a power outage
SXing at BroadViewNet.com
Wed Feb 1 19:58:05 CET 2012
Thanks Robert and Kaloyan.
I use same command as Robert mentioned when solve the problem manually.
I tried split brain auto resolve settings.
Seems too risky for some scenario, will leave it as for now,
do a manually resolve if needed instead of some risky auto process.
Thanks a lot, Kaloyan, your info is very helpful.
From: Kaloyan Kovachev [mailto:kkovachev at varna.net]
Sent: January-31-12 11:12 AM
To: Robert.Koeppl at knapp.com
Cc: Xing, Steven; drbd-user at lists.linbit.com; Digimer
Subject: Re: Antwort: Re: [DRBD-user] question about start drbd on single node after a power outage
On Tue, 31 Jan 2012 16:53:55 +0100, Robert.Koeppl at knapp.com wrote:
> You can force the node into primary by issuing "drbdadm --
> --overwrite-data-of-peer primary all". This is quite selfexplanatory
> regardingits consequences when the other node comes up again.
> if you issue a "drbdadm disconnect all" after that you make sure it
> not overwrite the other node when it comes up, so you can do some
> on the other node in case you are missing some data and are lucky.
> "drbdadm connect all" later on should reestablish the connection later
You may also use 'invalidate-remote' ... keep in mind full sync will take place. Taking a snapshot on 'before-resync-target' may help restoring some data without disconnecting.
As you are absolutely sure that's what you want - for autorecovery use:
> Mit freundlichen Grüßen / Best Regards
> Robert Köppl
> Customer Support & Projects
> Teamleader IT Support
> KNAPP Systemintegration GmbH
> Waltenbachstraße 9
> 8700 Leoben, Austria
> Phone: +43 3842 805-322
> Fax: +43 3842 82930-500
> robert.koeppl at knapp.com
> Commercial register number: FN 138870x Commercial register court:
> The information in this e-mail (including any attachment) is
> and intended to be for the use of the addressee(s) only. If you have
> received the e-mail by mistake, any disclosure, copy, distribution or
> of the contents of the e-mail is prohibited, and you must delete the
> e-mail from your system. As e-mail can be changed electronically KNAPP
> assumes no responsibility for any alteration to this e-mail or its
> attachments. KNAPP has taken every reasonable precaution to ensure
> that any attachment to this e-mail has been swept for virus. However,
> KNAPP does not accept any liability for damage sustained as a result
> of such attachment being virus infected and strongly recommend that
> you carry
> your own virus check before opening any attachment.
> "Xing, Steven" <SXing at BroadViewNet.com> Gesendet von:
> drbd-user-bounces at lists.linbit.com
> 31.01.2012 16:46
> "Kaloyan Kovachev" <kkovachev at varna.net>, "Digimer"
> <linux at alteeve.com> Kopie drbd-user at lists.linbit.com Thema
> Re: [DRBD-user] question about start drbd on single node after a power
> Your response both faster, ;)
> Thanks all. I understand that there is a high risk of split brain and
> loss if force one node boot up.
> But in my case, service recover is much more important than data loss.
> I have 2 node pacemaker cluster, drbd running as master/slave(controlled
> by pacemaker).
> When power outage on both nodes(just like unplug the power cable from
> node at the same time)
> Seems no chance for drbd to do anything(such as fence the peer), thus,
> after I just boot up the previous primary node,
> I saw the pacemaker trying to promote the drbd to master, but failed,
> since if I run drbdadm status, it shows:
> <drbd-status version="8.3.11" api="88">
> <resources config_file="/etc/drbd.conf">
> <resource minor="0" name="vksshared" cs="WFConnection" ro1="Secondary"
> ro2="Unknown" ds1="Consistent" ds2="DUnknown" />
> I tried set both of the wfc time out to 5sec, that not work.
> As you can see the service drbd started, but can not be promote, since
> ds1="Consistent", only "UpToDate" will work.
> Only when I boot up the other node, just after the 2 drbd instance
> connected, drbd can declare disk status as "UpToDate".
> If not boot up the other node, I have not find an automatic way no
> it is safe or not to force it think itself is "UpToDate".
> Seems the drbd did not remember its previous running status(primary or
> slave) for some safe reason.
> Do you have any idea/comments on this? I looked into the doc, could not
> find any setting can make this done even not safe.
> If I upgrade drbd to the latest version, will it help?
> -----Original Message-----
> From: Kaloyan Kovachev [mailto:kkovachev at varna.net]
> Sent: January-31-12 9:35 AM
> To: Digimer
> Cc: Xing, Steven; drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] question about start drbd on single node after
> power outage
> You were faster than me :)
> On Tue, 31 Jan 2012 09:04:49 -0500, Digimer <linux at alteeve.com> wrote:
>> If you want to force the issue though, you can use 'wfc-timeout 300'
>> which will tell DRBD to wait up to 5 minutes for it's peer. After that
>> time, consider itself primary. Please don't use this though until
>> you've exhausted all other ways of starting safely.
> There are two (well documented) options in drbd.conf - wfc-timeout and
> degr-wfc-timeout. To avoid split-brain i set both to 0.
> If you need to skip waiting you can manually do this from the console in
> case you start drbd standalone or before cman / pacemaker.
> In my case it is exported via iSCSI (not as cluster resource), so have
> additional wait loop for both nodes to became UpToDate for _all_
> configured resources before exporting any of them - 'no data' is better
> than 'broken data' - yes i have been bitten from the last one (luckily
> during the preproduction phase) and believe me you don't wan't that on
> production nodes (unless you have static read-only data)
> drbd-user mailing list
> drbd-user at lists.linbit.com
More information about the drbd-user