[DRBD-user] dopd failover / undo outdate

Thu Dec 6 10:38:04 CET 2007

> I think I'm starting to get my mind around why and how dopd but
> currently when the primary node goes down (pull the plug) the secondary
> node is getting the message to outdate the drbd resource.  Here is where
> I think it is happening on node2 after I pull the plug on node1:
> -------------------------------------------------------------------
> Dec  4 13:22:13 svr92 kernel: drbd0: helper command: /sbin/drbdadm
> outdate-peer
> Dec  4 13:22:13 svr92 kernel: drbd0: disk( UpToDate -> Outdated ) 
> Dec  4 13:22:13 svr92 kernel: drbd0: outdate-peer helper broken,
> returned 255 
> Dec  4 13:22:13 svr92 kernel: drbd0: State change failed: Refusing to be
> Primary without at least one UpToDate disk
> -------------------------------------------------------------------
> I can't find any reference to what "returned 255" means but the
> outdate-peer appears to be broken???

I see this as well right now (as you may have noticed from my thread 
about resource fencing). See if you can start drbd-peer-outdater -p 
<peername> -r <resourcename> by hand (while heartbeat and dopd are running).

On my testsystem, this segfaults and returns 255 which might indicate 
that when it is run by drbd (or heartbeat?), segfaults too.

> So . . . node1 goes down and somehow in the process outdates node2's
> resource so heartbeat can't bring it up.  There goes my redundancy BUT
> if node1 really is dead, how can I undo the outdate flag on node2 so I
> can bring that node up as the primary until I can fix node1?

You can always do "drbdadm -- --overwrite-data-of-peer primary 
<resourcename>". This will bring your outdated resource into primary 
state. But you don't want that to happen automatically.

Oh and btw: This is also true for the test-version 2.1.3 of heartbeat 
which was announced for testing yesterday and is scheduled for Dec, 
19th. It would be nice if it would get fixed for 2.1.3.

Regards
Dominik