[DRBD-user] Dual-primary split-brain recovery after reboot

Lars Ellenberg lars.ellenberg at linbit.com
Fri Jun 24 22:40:15 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Jun 24, 2011 at 01:03:35PM -0600, Pete Ashdown wrote:
> On 06/24/2011 12:43 PM, Lars Ellenberg wrote:
> >
> > Once you know that, you can either fix it yourself,
> > (and kindly tell the list what the problem was).
> > Or come back here with that additional information,
> > and ask for further assistance, and we will try to help fixing it, and
> > possibly even find out why it "went wrong" in the first place.
> >
> 
> Thank you Lars.  I have spent most of this week trying to get it to work,
> so I am not merely looking for a "click here solution".  I've tried
> tweaking the after-sb-* settings and init scripts and it always comes back
> in StandAlone disconnected mode.  I've also tried forcing secondary upon
> shutdown, but it usually fails due to the "held open by someone" error.

> /etc/drbd.conf:

You cannot configure away users of DRBD.
You need to stop whatever is using DRBD first.

Maybe start by checking whether "bad things"
happen on shutdown, on reboot, or on both?

If you do reboot into single user mode,
then manually configure DRBD,
does it stil "detect split brain"?

If not, shutdown is ok,
and reboot is the problem.

> resource r0 {
>         protocol C;
>         startup {
>                 wfc-timeout  15;
>                 degr-wfc-timeout 60;

This is very unusual, and likely not what you meant.
typically wfc-timeout is (much) larger than degr-wfc-timeout.

>         become-primary-on both;

If things are ok after shutdown,
then maybe it is just a bad idea (in your setup).

Possibly "upstart" gets things wrong.  You could try to drop it,
and do the good old system V style init system only.

Maybe it is a bad choice to try and make things primary from the init
script (in your setup; actually, I'd say in most setups).

If that turns out to be the case,
defer promoting DRBD (and starting whatever is supposed to use it)
to some script/job/agent/hook that is running later.

Possibly forget about the init script (chkconfig off,
update-rc.d disable, or whatever the correct debian incantation is),
and instead have pacemaker manage stuff?

>         }
>         net {
>                 cram-hmac-alg sha1;
>                 shared-secret "secret";
>         allow-two-primaries;
>         after-sb-0pri discard-younger-primary;
>         after-sb-1pri discard-secondary;
>         after-sb-2pri call-pri-lost-after-sb;

You are aware that you configured automatic data loss there, right?
Just because something is "younger" does not mean a thing.
Just because something is "secondary" *at the point in time of the
connection handshake* does not mean it has bad data.

>         }
>         on servera {
>                 device /dev/drbd0;
>                 disk /dev/md2;
>                 address 10.0.0.5:7788;
>                 meta-disk internal;
>         }
> 
>         on serverb {
>                 device /dev/drbd0;
>                 disk /dev/md2;
>                 address 10.0.0.6:7788;
>                 meta-disk internal;
>         }
> }
> 
> Syslog upon reboot and during manual recovery: 
> http://pastebin.com/Rq9Yyp4A

Jun 24 12:30:39 serverb kernel: [  221.439427] block drbd0: incompatible after-sb-0pri settings
Jun 24 12:30:39 serverb kernel: [  221.443142] block drbd0: conn( WFReportParams -> Disconnecting ) 

How come the after-sb settings are "incompatible"?
They should be the same on both peers.

> Note the "sync from peer node", which is
> fine, but isn't automatically done by the init script.  I have to manually
> run the following in a short time window after system reboot to recover
> completely:
>            drbdadm secondary all
>            drbdadm -- --discard-my-data connect all
>            drbdadm primary all
> 
> 
> I inserted a "cp /proc/drbd /tmp" in /etc/init.d/drbd right before the
> "$DRBDADM sh-b-pri all # Become primary if configured" line and this is
> what it shows:
> 
> version: 8.3.7 (api:88/proto:86-91)
> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root at serverb,
> 2011-0
> 6-17 12:50:42
>  0: cs:Disconnecting ro:Secondary/Unknown ds:UpToDate/DUnknown C r----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list