[DRBD-user] drbdsetup primary with timeout [was: To stonith or not to stonith?]

Fri Jun 23 15:27:28 CEST 2006

Lars Ellenberg wrote:
> / 2006-05-30 12:40:12 -0500
> \ Dave Dykstra:
>> Reviving an old thread ...
> 
> thanks for the reminder...
> 
>>>>> I think that doing multiple tries in the drbddisk command is a
>>>>> hack, though, especially since it doesn't take into account any
>>>>> change in the "timeout" parameter that there may be in
>>>>> drbd.conf. I think the 'drbdsetup primary' command (possibly
>>>>> with a new option that drbddisk invokes) should try to contact
>>>>> the remote side and wait until there is either a positive
>>>>> response or a timeout before it exits with an error.
>>>> what is there is a "hack".
>>>>
>>>> it is a misconfiguration, when heartbeat deadtime was
>>>> smaller than drbd ping time, though.
>>>>
>>>> still it could be desirable to have an option like that outlined
>>>> above, "drbdsetup /dev/drbd0 primary --I-think-peer-is-dead", and
>>>> this option would typically be used by the heartbeat resource
>>>> script/agent.
>>> I think rather it should be something like
>>> --I-think-peer-may-be-dead because the heartbeat resource script
>>> would do the same thing no matter how it is coming up.
>>>
>>>> this will probably be implemented in 0.8 ...
>> I see that the latest 8.0 pre-release code in subversion is still
>> using a loop count of 6 in the drbddisk script and is not using an
>> option like one we discussed.   If this is still quite low on the
>> priority list, I suggest that the loop count maximum in drbddisk be
>> increased for now because it's easy and it does work.
>>
>> Lars, what do you think?
> 
> we put this on the roadmap again.
> 
> but actually, since we (try to) do certain state changes now with
> "cluster wide synchronisation", this should be a non issue meanwhile
> for drbd8 svn, and the retry loop in the script can probably go.
> 
> maybe we don't do this correctly yet and bail out too early without
> verification whether what we think about the peers status is still true.
> but that would be a bug and should be fixed.

I think the problem comes up when the primary is dead.  Can you 'cluster 
wide synchronisation' in that case?

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce