[DRBD-user] Please help... After reboot I'm always getting unresolved split brain (DRBD+OCFS2)

Jacek Osiecki cjosh at silvercube.pl
Thu Jan 24 18:04:05 CET 2013

On Thu, 24 Jan 2013, Felix Frank wrote:

> On 01/22/2013 05:04 PM, Jacek Osiecki wrote:
>> [41706.085879] block drbd0: PingAck did not arrive in time.
>> [41706.085888] block drbd0: peer( Primary -> Unknown ) conn( Connected
>> -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
>> [41706.086007] block drbd0: new current UUID
>> 62770026DDB5FC9D:1AD40906305F01A9:E24FA72FCFB3A8FD:E24EA72FCFB3A8FD
> Uhuh. So, your peer just cut the network connection without
> deconfiguring DRBD first (i.e., unmount, go Secondary etc.).

OK, this is something :)
My first split brain situation was when one of the nodes got instant power 
down. But later it was recurring even after legitimate reboots...
And yes, you're right. It seems, that network was set to shutdown before 
the DRBD (actually, even before the o2cb or ocfs2 too!).

I have set the correct order, but seems that drbd doesn't want to stop:

root at oscar ~> /etc/init.d/drbd stop
DRBD module version: 8.3.11
    userland version: 8.4.1
preferably kernel and userland versions should match.
Stopping all DRBD resources: umount: /dev/drbd0: not mounted
all: Failure: (127) Device minor not allocated
ERROR: Module drbd is in use
Retrying once...
umount: /dev/drbd0: not mounted
all: Failure: (127) Device minor not allocated
ERROR: Module drbd is in use

Any ideas? Am I getting it right that if drbd would stop well before 
network shutdown, the node would reconnect as primary-primary after 

> Aside #1 your userland mismatch is rather drastic, you probably want an
> 8.3 userland.

It's so in my linux distro... I got much newer kernel but seems that still 
it's only 8.3 drbd in it.

> Aside #2, are you sure you need dual-primary operation? Can vservers
> live-migrate already and noone told me? ;-)

I'm using it for a HA configuration, where two machines are serving the 
same data through WWW server. Machines are set behind a load balancer. It 
would be nice if after rebooting any node, it would refresh its data from 
the other node and then start serving data as usual...

Jacek Osiecki
josiecki at silvercube.pl

Silvercube s.c.
ul. Makuszynskiego 4
31-752 Kraków
+48 (12) 684 21 00

