Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 01/15/2012 08:18 AM, Lars Ellenberg wrote: > Some comments on where I think that script's logic > is incomplete, still: > > First, if you manage to get a simultaneous cluster crash, > and then only one node comes back, you'll be offline, > and need admin intervention to get online again. > There is no easy way around that, though, > so that is a common problem to all such setups. This is a valid concern, but if I understand right, it's more a problem of "both crash, one recovers". If both recover, drbd should reconnect and do it's magic, never calling this script. Assuming that is correct, and barring a suggestion on reliably determining that it's ok to go UpToDate/Primary, I'd rather leave things hung for an admin to deal with, given the alternative risk of data loss. > # Features > # - Clusters > 2 nodes supported, provided > > drbd.conf can have more than to "on $uname {}" or "floating $ip {}" > sections per resource, to accomodate for a "floating" setup, > i.e. several nodes able to access the same data set, > which may be a FC/iSCSI SAN, or a lower level DRBD in a stacked setup. > > If you have exactly two such nodes, your assumptions should work. > > If you have more than two such "on $uname {}" sections in drbd.conf, > you need to be aware that: > > # These are the environment variables set by DRBD. See 'man drbd.conf' > # -> 'handlers'. > env => { > # The resource triggering the fence. > 'DRBD_RESOURCE' => $ENV{DRBD_RESOURCE}, > # The resource minor number. > 'DRBD_MINOR' => $ENV{DRBD_MINOR}, > # This is 'ipv4' or 'ipv6' > 'DRBD_PEER_AF' => $ENV{DRBD_PEER_AF}, > # The address of the peer(s). > 'DRBD_PEER_ADDRESS' => $ENV{DRBD_PEER_ADDRESS}, > > DRBD_PEER_ADDRESS and _AF are both singular, and set to the currently > configured peer, if any. They may also be empty, if there is more than > one potential peer, and none of them is currently configured. I've deleted all but DRBD_RESOURCE and DRBD_PEERS now, as I wasn't using either anyway. > DRBD_PEERS, however, is plural, > and will contain a space separated list of possible peer unames, > or may be empty if that list could not be determined (maybe because > DRBD_PEER_ADDRESS was not set). > > # The peer(s) hostname(s) > 'DRBD_PEERS' => $ENV{DRBD_PEERS}, > }, Ah, I was expecting 'DRBD_PEERS' to be only the one that went silent and needed to be fenced, even in stacked. Ok, support for 2-node DRBD (though in multi-node cluster) it shall be. If someone wants to see this support stacked, they can contribute a patch. :) > So you may want to document that your expectation is a classic two node > DRBD configuration, even if those nodes may be part of a > 2 node cluster. Done. > # Example output showing what the bits mean. > # +--< Current data generation UUID >- > # | +--< Bitmap's base data generation UUID >- > # | | +--< younger history UUID >- > # | | | +-< older history >- > # V V V V > # C3864FB60759430F:0000000000000000:A8C791FB53E8ED2B:A8C691FB53E8ED2B:1:1:1:1:0:0:0 > # ^ ^ ^ ^ ^ ^ ^ > # -< Data consistency flag >--+ | | | | | | > # -< Data was/is currently up-to-date >--+ | | | | | > # -< Node was/is currently primary >--+ | | | | > # -< Node was/is currently connected >--+ | | | > # -< Node was in the progress of setting all bits in the bitmap >--+ | | > # -< The peer's disk was out-dated or inconsistent >--+ | > # -< This node was a crashed primary, and has not seen its peer since >--+ > # > # flags: Primary, Connected, UpToDate > > # The sixth value will be 1 (UpToDate) or 0 (other). > ($conf->{sys}{local_res_uptodate}, $conf->{sys}{local_res_was_current_primary})=($status_line=~/.*?:.*?:.*?:.*?:\d:(\d):(\d):\d:\d:\d:\d/); > > to_log($conf, 0, __LINE__, "DEBUG: UpToDate: [$conf->{sys}{local_res_uptodate}]") if $conf->{sys}{debug}; > to_log($conf, 0, __LINE__, "DEBUG: Was Current Primary: [$conf->{sys}{local_res_was_current_primary}]") if $conf->{sys}{debug}; > > You want the current disk state of this resource, > and refuse unless that reports UpToDate. Done. > This test is not sufficient: > to_log($conf, 1, __LINE__, > "Local resource: [$conf->{env}{DRBD_RESOURCE}] is NOT 'UpToDate', > will not fence peer.") > if not $conf->{sys}{local_res_uptodate}; > > It does not reflect the current state, but the state as stored in our "meta data flags". > It will always say it "was" UpToDate, > if it is Consistent, and does not *know* to be Outdated. > Maybe it was more clear if we had the inverse logic, > and named that flag "is certain to contain outdated data". > > You are interested in a state not expressed here (and not easily > possible to express in persistent meta data flags): > It is Consistent, it knows it *was* UpToDate, > neither self nor peer is marked Outdated: > it does not know yet if the peer has better data or not. > > Also, in general I'd recommend to avoid calling drbdsetup, > either explicitly, or implicitly via drbdadm, from a fence-peer-handler. > In earlier drbd versions that would reliably timeout without response, > it possibly would work now, but.... > > If you look at crm-fence-peer.sh, > you'll notice that I grep the current state from /proc/drbd. Changed to parse /proc/drbd, so the above concerns should not be in play anymore. > And this part of the logic is not good, explained below. > to_log($conf, 1, __LINE__, > "Local resource: [$conf->{env}{DRBD_RESOURCE}] was NOT 'Current Primary' and > likely recovered from being fenced, will not fence peer.") > if not $conf->{sys}{local_res_was_current_primary}; > > Scenario: > All good, replicating, but only *one* Primary now. > Primary node crash/power outage/whatever. > > Cluster wants to now promote the remaining, still Secondary node. > > Secondary, when promoted without established replication link, > will call the fence-peer handler during promotion. > > Your handler will "fail", because it is not marked "current primary", > causing the promotion to be aborted. Relying now of local disk state only, this should also now be a non-issue, I believe. > Rest of the logic looked OK at first glance. > > Thank you. > > Would you rather keep this separately, > or should we start distributing this > (or some later revision of it) with DRBD? As soon as you think it's well enough flushed out, I would be happy to see it included directly with DRBD itself. Changes pushed to github. I plan to make some changes as per fabionne's suggestions shortly, but 06a126f315b2f1cbcf2bc7485507815266d34 (https://github.com/digimer/rhcs_fence/commit/06a126f315b2f1cbcf2bc7485507815266d34926) reflects your feedback. Cheers! -- Digimer E-Mail: digimer at alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron