[DRBD-user] Split Brain due to 'diskless' state with pacemaker/heartbeat

Sat Jun 2 00:20:14 CEST 2012

On 06/01/12 18:22, Lars Ellenberg wrote:
> There is one improvement we could make in DRBD:
> call the fence-peer handler not only for connection loss,
> but also for peer disk failure.

That sounds like a good and simple idea to me.

>>> Alternitively, a constraint in pacemaker on diskless state until a
>>> re-sync has been completed.
>>
>> You could actually do that with using the crm-fence-peer.sh handler as
>> your local-io-error handler, albeit with two drawbacks:
>>
>> 1. The local-io-error has an exit code convention that is different from
>> the fence-peer one (so you'd need to use a wrapper).
> 
> exit code of local-io-error handler is ignored

Makes sense. Good to know.

>> 2. In order to actually mask the I/O error from your upper layers, you'd
>> now have to call "drbdadm detach" from the local-io-error handler, and
>> iirc calling drbdadm from a drbdadm handler is a bad idea.
> 
> local-io-error handler is called after the device was detached already.
> it is just an additional action.

Oh. That I didn't know, and the man page doesn't say so. Well then that
approach is out anyway.

>>> Any Suggestions?
>>
>> Lars: would it make sense for a Secondary that detaches (either by user
>> intervention or after an I/O error) to at least _try_ to outdate itself
>> in the metadata?
> 
> I think it does.
> 
> There is a related scenario:

Related, yes. This one is about to dual node failure. But Philip's
scenario is about 1 node, 1 disk failure. So even if the dual-node
failure can't be helped, Philip's problem might be.

>  Alice crashes.
> 
>  Bob was primary already, or it took over, does not matter.
>  Bob continues to modify data.
> 
>  Bob down (clean or unclean, does not matter).
> 
>  Alice comes back.
> 
>  Now what?
>    What should a single node (in a two node cluster) do after startup?
>    It does not know if it has good or bad data.
>    Even if bob had placed a constraint,
>    in this scenario that constraint can not make it to alice.
> 
>    So there you have your policy decision.
>    If you do not know for sure,
>      Do you want to stay down just in case,
>      risking downtime for no reason?
> 
>      Do you want to go online, despite your doubts,
>      risking going online with stale data?
> 
> With multiple failures, you will always be able to construct
> a scenario where you end up at the above policy decision.

Of course.

> 
> In any case, if you configure fencing resource-and-stonith,
> drbd comes up as "Consistent" only (not UpToDate),
> so it needs to fence the peer, or promotion will fail.

resource-and-stonith only? Isn't this true for fencing resource-only as
well?

> If the peer is unreachable (exit code 5), and DRBD is only Consistent,
> drbd counts that as fail, and will refuse promotion.

I had thought I recalled that that was changed at some point, but I
can't dig up a commit to that effect, so I'll take your word for it.

Philip, it's actually not entirely clear whether you're using any
fencing or not. Your per-resource "disk" section has no "fencing"
policy, so it would appear that you're not, but your common section is
incomplete so you may have set it there. At any rate, setting it to
"resource-only" is necessary for your crm-fence-peer.sh script to ever
execute.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now