Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 06/01/12 11:16, Philip Gaw wrote: > 1. Secondary goes into diskless state due to a broken array (as > expected) while primary is being written to > 2. Primary then dies (power failure) > 3. Secondary gets rebooted, or dies and comes back online etc. > > The secondary will become primary - and as secondary was 'diskless' > before, contains out of date stale data. > > When the 'real' primary comes back online we then have our split brain. > > > What I think needs to happen is a way to mark 'diskless' state as > outdated so that pacemaker will not attempt to bring this node into > primary. That's a catch-22. The "outdated" state is stored locally in the DRBD metadata, which we don't have access to if the resource is Diskless. > As this disk is diskless with internal metadata, this cannot be stored > in the drbd metadata. Which presently rules out the Outdated state. So you figured that part out yourself. > Alternitively, a constraint in pacemaker on diskless state until a > re-sync has been completed. You could actually do that with using the crm-fence-peer.sh handler as your local-io-error handler, albeit with two drawbacks: 1. The local-io-error has an exit code convention that is different from the fence-peer one (so you'd need to use a wrapper). 2. In order to actually mask the I/O error from your upper layers, you'd now have to call "drbdadm detach" from the local-io-error handler, and iirc calling drbdadm from a drbdadm handler is a bad idea. > Any Suggestions? Lars: would it make sense for a Secondary that detaches (either by user intervention or after an I/O error) to at least _try_ to outdate itself in the metadata? Granted, if there is an actual I/O problem that also affects the metadata area this would fail, and if you've got an I/O tarpit it might actually exacerbate the problem, but at least DRBD could try. Or does it do that already? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now