Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 01/15/2012 08:18 AM, Lars Ellenberg wrote:
> Some comments on where I think that script's logic
> is incomplete, still:
>
> First, if you manage to get a simultaneous cluster crash,
> and then only one node comes back, you'll be offline,
> and need admin intervention to get online again.
> There is no easy way around that, though,
> so that is a common problem to all such setups.
This is a valid concern, but if I understand right, it's more a problem
of "both crash, one recovers". If both recover, drbd should reconnect
and do it's magic, never calling this script. Assuming that is correct,
and barring a suggestion on reliably determining that it's ok to go
UpToDate/Primary, I'd rather leave things hung for an admin to deal
with, given the alternative risk of data loss.
> # Features
> # - Clusters > 2 nodes supported, provided
>
> drbd.conf can have more than to "on $uname {}" or "floating $ip {}"
> sections per resource, to accomodate for a "floating" setup,
> i.e. several nodes able to access the same data set,
> which may be a FC/iSCSI SAN, or a lower level DRBD in a stacked setup.
>
> If you have exactly two such nodes, your assumptions should work.
>
> If you have more than two such "on $uname {}" sections in drbd.conf,
> you need to be aware that:
>
> # These are the environment variables set by DRBD. See 'man drbd.conf'
> # -> 'handlers'.
> env => {
> # The resource triggering the fence.
> 'DRBD_RESOURCE' => $ENV{DRBD_RESOURCE},
> # The resource minor number.
> 'DRBD_MINOR' => $ENV{DRBD_MINOR},
> # This is 'ipv4' or 'ipv6'
> 'DRBD_PEER_AF' => $ENV{DRBD_PEER_AF},
> # The address of the peer(s).
> 'DRBD_PEER_ADDRESS' => $ENV{DRBD_PEER_ADDRESS},
>
> DRBD_PEER_ADDRESS and _AF are both singular, and set to the currently
> configured peer, if any. They may also be empty, if there is more than
> one potential peer, and none of them is currently configured.
I've deleted all but DRBD_RESOURCE and DRBD_PEERS now, as I wasn't using
either anyway.
> DRBD_PEERS, however, is plural,
> and will contain a space separated list of possible peer unames,
> or may be empty if that list could not be determined (maybe because
> DRBD_PEER_ADDRESS was not set).
>
> # The peer(s) hostname(s)
> 'DRBD_PEERS' => $ENV{DRBD_PEERS},
> },
Ah, I was expecting 'DRBD_PEERS' to be only the one that went silent and
needed to be fenced, even in stacked. Ok, support for 2-node DRBD
(though in multi-node cluster) it shall be. If someone wants to see this
support stacked, they can contribute a patch. :)
> So you may want to document that your expectation is a classic two node
> DRBD configuration, even if those nodes may be part of a > 2 node cluster.
Done.
> # Example output showing what the bits mean.
> # +--< Current data generation UUID >-
> # | +--< Bitmap's base data generation UUID >-
> # | | +--< younger history UUID >-
> # | | | +-< older history >-
> # V V V V
> # C3864FB60759430F:0000000000000000:A8C791FB53E8ED2B:A8C691FB53E8ED2B:1:1:1:1:0:0:0
> # ^ ^ ^ ^ ^ ^ ^
> # -< Data consistency flag >--+ | | | | | |
> # -< Data was/is currently up-to-date >--+ | | | | |
> # -< Node was/is currently primary >--+ | | | |
> # -< Node was/is currently connected >--+ | | |
> # -< Node was in the progress of setting all bits in the bitmap >--+ | |
> # -< The peer's disk was out-dated or inconsistent >--+ |
> # -< This node was a crashed primary, and has not seen its peer since >--+
> #
> # flags: Primary, Connected, UpToDate
>
> # The sixth value will be 1 (UpToDate) or 0 (other).
> ($conf->{sys}{local_res_uptodate}, $conf->{sys}{local_res_was_current_primary})=($status_line=~/.*?:.*?:.*?:.*?:\d:(\d):(\d):\d:\d:\d:\d/);
>
> to_log($conf, 0, __LINE__, "DEBUG: UpToDate: [$conf->{sys}{local_res_uptodate}]") if $conf->{sys}{debug};
> to_log($conf, 0, __LINE__, "DEBUG: Was Current Primary: [$conf->{sys}{local_res_was_current_primary}]") if $conf->{sys}{debug};
>
> You want the current disk state of this resource,
> and refuse unless that reports UpToDate.
Done.
> This test is not sufficient:
> to_log($conf, 1, __LINE__,
> "Local resource: [$conf->{env}{DRBD_RESOURCE}] is NOT 'UpToDate',
> will not fence peer.")
> if not $conf->{sys}{local_res_uptodate};
>
> It does not reflect the current state, but the state as stored in our "meta data flags".
> It will always say it "was" UpToDate,
> if it is Consistent, and does not *know* to be Outdated.
> Maybe it was more clear if we had the inverse logic,
> and named that flag "is certain to contain outdated data".
>
> You are interested in a state not expressed here (and not easily
> possible to express in persistent meta data flags):
> It is Consistent, it knows it *was* UpToDate,
> neither self nor peer is marked Outdated:
> it does not know yet if the peer has better data or not.
>
> Also, in general I'd recommend to avoid calling drbdsetup,
> either explicitly, or implicitly via drbdadm, from a fence-peer-handler.
> In earlier drbd versions that would reliably timeout without response,
> it possibly would work now, but....
>
> If you look at crm-fence-peer.sh,
> you'll notice that I grep the current state from /proc/drbd.
Changed to parse /proc/drbd, so the above concerns should not be in play
anymore.
> And this part of the logic is not good, explained below.
> to_log($conf, 1, __LINE__,
> "Local resource: [$conf->{env}{DRBD_RESOURCE}] was NOT 'Current Primary' and
> likely recovered from being fenced, will not fence peer.")
> if not $conf->{sys}{local_res_was_current_primary};
>
> Scenario:
> All good, replicating, but only *one* Primary now.
> Primary node crash/power outage/whatever.
>
> Cluster wants to now promote the remaining, still Secondary node.
>
> Secondary, when promoted without established replication link,
> will call the fence-peer handler during promotion.
>
> Your handler will "fail", because it is not marked "current primary",
> causing the promotion to be aborted.
Relying now of local disk state only, this should also now be a
non-issue, I believe.
> Rest of the logic looked OK at first glance.
>
> Thank you.
>
> Would you rather keep this separately,
> or should we start distributing this
> (or some later revision of it) with DRBD?
As soon as you think it's well enough flushed out, I would be happy to
see it included directly with DRBD itself.
Changes pushed to github. I plan to make some changes as per fabionne's
suggestions shortly, but 06a126f315b2f1cbcf2bc7485507815266d34
(https://github.com/digimer/rhcs_fence/commit/06a126f315b2f1cbcf2bc7485507815266d34926)
reflects your feedback.
Cheers!
--
Digimer
E-Mail: digimer at alteeve.com
Freenode handle: digimer
Papers and Projects: http://alteeve.com
Node Assassin: http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron