[DRBD-user] Antwort: Re: question about start drbd on single node after a power outage

Tue Jan 31 16:53:55 CET 2012

Hi!
You can force the node into primary by issuing "drbdadm -- 
--overwrite-data-of-peer primary all". This is quite selfexplanatory 
regardingits consequences when the other node comes up again.
if you issue a "drbdadm disconnect all" after that you make sure it does 
not overwrite the other node when it comes up, so you can do some recovery 
on the other node in case you are missing some data and are lucky. 
"drbdadm connect all" later on should reestablish the connection later on.
Mit freundlichen Grüßen / Best Regards

Robert Köppl

Customer Support & Projects 
Teamleader IT Support

KNAPP Systemintegration GmbH
Waltenbachstraße 9
8700 Leoben, Austria 
Phone: +43 3842 805-322
Fax: +43 3842 82930-500
robert.koeppl at knapp.com 
www.KNAPP.com 

Commercial register number: FN 138870x
Commercial register court: Leoben

The information in this e-mail (including any attachment) is confidential 
and intended to be for the use of the addressee(s) only. If you have 
received the e-mail by mistake, any disclosure, copy, distribution or use 
of the contents of the e-mail is prohibited, and you must delete the 
e-mail from your system. As e-mail can be changed electronically KNAPP 
assumes no responsibility for any alteration to this e-mail or its 
attachments. KNAPP has taken every reasonable precaution to ensure that 
any attachment to this e-mail has been swept for virus. However, KNAPP 
does not accept any liability for damage sustained as a result of such 
attachment being virus infected and strongly recommend that you carry out 
your own virus check before opening any attachment.

"Xing, Steven" <SXing at BroadViewNet.com> 
Gesendet von: drbd-user-bounces at lists.linbit.com
31.01.2012 16:46

An
"Kaloyan Kovachev" <kkovachev at varna.net>, "Digimer" <linux at alteeve.com>
Kopie
drbd-user at lists.linbit.com
Thema
Re: [DRBD-user] question about start drbd on single node after a power 
outage

Your response both faster, ;)
Thanks all. I understand that there is a high risk of split brain and data 
loss if force one node boot up.
But in my case, service recover is much more important than data loss.

I have 2 node pacemaker cluster, drbd running as master/slave(controlled 
by pacemaker).
When power outage on both nodes(just like unplug the power cable from both 
node at the same time)
Seems no chance for drbd to do anything(such as fence the peer), thus, 
after I just boot up the previous primary node, 
I saw the pacemaker trying to promote the drbd to master, but failed, 
since if I run drbdadm status, it shows:

<drbd-status version="8.3.11" api="88">
<resources config_file="/etc/drbd.conf">
<resource minor="0" name="vksshared" cs="WFConnection" ro1="Secondary" 
ro2="Unknown" ds1="Consistent" ds2="DUnknown" />
</resources>
</drbd-status>

I tried set both of the wfc time out to 5sec, that not work.

As you can see the service drbd started, but can not be promote, since 
ds1="Consistent", only "UpToDate" will work.
Only when I boot up the other node, just after the 2 drbd instance 
connected, drbd can declare disk status as "UpToDate".

If not boot up the other node, I have not find an automatic way no matter 
it is safe or not to force it think itself is "UpToDate".
Seems the drbd did not remember its previous running status(primary or 
slave) for some safe reason.

Do you have any idea/comments on this? I looked into the doc, could not 
find any setting can make this done even not safe. 
If I upgrade drbd to the latest version, will it help?

-----Original Message-----
From: Kaloyan Kovachev [mailto:kkovachev at varna.net] 
Sent: January-31-12 9:35 AM
To: Digimer
Cc: Xing, Steven; drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] question about start drbd on single node after a 
power outage

You were faster than me :)

On Tue, 31 Jan 2012 09:04:49 -0500, Digimer <linux at alteeve.com> wrote:
> 
> If you want to force the issue though, you can use 'wfc-timeout 300'
> which will tell DRBD to wait up to 5 minutes for it's peer. After that 
> time, consider itself primary. Please don't use this though until 
> you've exhausted all other ways of starting safely.

There are two (well documented) options in drbd.conf - wfc-timeout and 
degr-wfc-timeout. To avoid split-brain i set both to 0.

If you need to skip waiting you can manually do this from the console in 
case you start drbd standalone or before cman / pacemaker.

In my case it is exported via iSCSI (not as cluster resource), so have 
additional wait loop for both nodes to became UpToDate for _all_ 
configured resources before exporting any of them - 'no data' is better 
than 'broken data' - yes i have been bitten from the last one (luckily 
during the preproduction phase) and believe me you don't wan't that on 
production nodes (unless you have static read-only data)
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120131/122c438a/attachment.htm>