Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote: > / 2006-05-30 12:40:12 -0500 > \ Dave Dykstra: >> Reviving an old thread ... > > thanks for the reminder... > >>>>> I think that doing multiple tries in the drbddisk command is a >>>>> hack, though, especially since it doesn't take into account any >>>>> change in the "timeout" parameter that there may be in >>>>> drbd.conf. I think the 'drbdsetup primary' command (possibly >>>>> with a new option that drbddisk invokes) should try to contact >>>>> the remote side and wait until there is either a positive >>>>> response or a timeout before it exits with an error. >>>> what is there is a "hack". >>>> >>>> it is a misconfiguration, when heartbeat deadtime was >>>> smaller than drbd ping time, though. >>>> >>>> still it could be desirable to have an option like that outlined >>>> above, "drbdsetup /dev/drbd0 primary --I-think-peer-is-dead", and >>>> this option would typically be used by the heartbeat resource >>>> script/agent. >>> I think rather it should be something like >>> --I-think-peer-may-be-dead because the heartbeat resource script >>> would do the same thing no matter how it is coming up. >>> >>>> this will probably be implemented in 0.8 ... >> I see that the latest 8.0 pre-release code in subversion is still >> using a loop count of 6 in the drbddisk script and is not using an >> option like one we discussed. If this is still quite low on the >> priority list, I suggest that the loop count maximum in drbddisk be >> increased for now because it's easy and it does work. >> >> Lars, what do you think? > > we put this on the roadmap again. > > but actually, since we (try to) do certain state changes now with > "cluster wide synchronisation", this should be a non issue meanwhile > for drbd8 svn, and the retry loop in the script can probably go. > > maybe we don't do this correctly yet and bail out too early without > verification whether what we think about the peers status is still true. > but that would be a bug and should be fixed. I think the problem comes up when the primary is dead. Can you 'cluster wide synchronisation' in that case? -- Alan Robertson <alanr at unix.sh> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce