Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Apr 5, 2012 at 8:34 PM, Brian Chrisman <brchrisman at gmail.com> wrote: > I have a shared/parallel filesystem on top of drbd dual primary/protocol C > (using 8.3.11 right now). _Which_ filesystem precisely? > My question is about recovering after a network outage where I have a > 'resource-and-stonith' fence handler which panics both systems as soon as > possible. Self-fencing is _not_ how a resource-and-stonith fencing handler is meant to operate. > Even with Protocol-C, can the bitmaps still have dirty bits set? (ie, > different writes on each local device which haven't returned/acknowledged to > the shared filesystem because they haven't yet been written remotely?) The bitmaps only apply to background synchronization. Foreground replication does not use the quick-sync bitmap. > Maybe a more concrete example will make my question clearer: > - node A & B (2 node cluster) are operating nominally in primary/primary > mode (shared filesystem provides locking and prevents simultaneous write > access to the same blocks on the shared disk). > - node A: write to drbd device, block 234567, written locally, but remote > copy does not complete due to network failure > - node B: write to drbd device, block 876543, written locally, but remote > copy does not complete due to network failure Makes sense up to here. > - Both writes do not complete and do not return successfully to the > filesystem (protocolC). You are aware that "do not return successfully" means that no completion is signaled, which is correct, but not that non-completion is signaled, which would be incorrect? > - Fencing handler is invoked, where I can suspend-io and/or panic both nodes > (since neither one is reliable at this point). "Panicking" a node is pointless, and panicking both is even worse. What fencing is meant to do is use an alternate communications channel to remove the _other_ node, not the local one. And only one of them will win. > If there is a chance of having unreplicated/unacknowledged writes on two > different disks (those writes can't conflict, because the shared filesystem > wont write to the same blocks on both nodes simultaneously), is there a > resync option that will effectively 'revert' any unreplicated/unacknowledged > writes? Yes, it's called the Activity Log, but you've got this part wrong as you're under an apparent misconception as to what the fencing handler should be doing. > I am considering writing a test for this and would like to know a bit more > about what to expect before I do so. Tell us what exactly you're trying to achieve please? Florian -- Need help with High Availability? http://www.hastexo.com/now