[DRBD-user] Split Brain due to 'diskless' state with pacemaker/heartbeat

Tue Jun 5 12:00:57 CEST 2012

On 01/06/2012 23:20, Florian Haas wrote:
> On 06/01/12 18:22, Lars Ellenberg wrote:
>> There is one improvement we could make in DRBD:
>> call the fence-peer handler not only for connection loss,
>> but also for peer disk failure.
> That sounds like a good and simple idea to me.
>
>>>> Alternitively, a constraint in pacemaker on diskless state until a
>>>> re-sync has been completed.
>>> You could actually do that with using the crm-fence-peer.sh handler as
>>> your local-io-error handler, albeit with two drawbacks:
>>>
>>> 1. The local-io-error has an exit code convention that is different from
>>> the fence-peer one (so you'd need to use a wrapper).
>> exit code of local-io-error handler is ignored
> Makes sense. Good to know.
>
>>> 2. In order to actually mask the I/O error from your upper layers, you'd
>>> now have to call "drbdadm detach" from the local-io-error handler, and
>>> iirc calling drbdadm from a drbdadm handler is a bad idea.
>> local-io-error handler is called after the device was detached already.
>> it is just an additional action.
> Oh. That I didn't know, and the man page doesn't say so. Well then that
> approach is out anyway.
>
>>>> Any Suggestions?
>>> Lars: would it make sense for a Secondary that detaches (either by user
>>> intervention or after an I/O error) to at least _try_ to outdate itself
>>> in the metadata?
>> I think it does.
>>
>> There is a related scenario:
> Related, yes. This one is about to dual node failure. But Philip's
> scenario is about 1 node, 1 disk failure. So even if the dual-node
> failure can't be helped, Philip's problem might be.
>
>>   Alice crashes.
>>
>>   Bob was primary already, or it took over, does not matter.
>>   Bob continues to modify data.
>>
>>   Bob down (clean or unclean, does not matter).
>>
>>   Alice comes back.
>>
>>   Now what?
>>     What should a single node (in a two node cluster) do after startup?
>>     It does not know if it has good or bad data.
>>     Even if bob had placed a constraint,
>>     in this scenario that constraint can not make it to alice.
>>
>>     So there you have your policy decision.
>>     If you do not know for sure,
>>       Do you want to stay down just in case,
>>       risking downtime for no reason?
>>
>>       Do you want to go online, despite your doubts,
>>       risking going online with stale data?
>>
>> With multiple failures, you will always be able to construct
>> a scenario where you end up at the above policy decision.
> Of course.
>
>> In any case, if you configure fencing resource-and-stonith,
>> drbd comes up as "Consistent" only (not UpToDate),
>> so it needs to fence the peer, or promotion will fail.
> resource-and-stonith only? Isn't this true for fencing resource-only as
> well?
>
>> If the peer is unreachable (exit code 5), and DRBD is only Consistent,
>> drbd counts that as fail, and will refuse promotion.
> I had thought I recalled that that was changed at some point, but I
> can't dig up a commit to that effect, so I'll take your word for it.
>
> Philip, it's actually not entirely clear whether you're using any
> fencing or not. Your per-resource "disk" section has no "fencing"
> policy, so it would appear that you're not, but your common section is
> incomplete so you may have set it there. At any rate, setting it to
> "resource-only" is necessary for your crm-fence-peer.sh script to ever
> execute.
>

Sorry, I did mean to include this information - it is fencing 
resource-and-stonth (as you would expect).

The crm-fence-peer does fire and work perfectly OK.

> Cheers,
> Florian
>