[DRBD-user] Interesting issue with drbd 9 and fencing
Digimer
lists at alteeve.ca
Tue Feb 13 18:29:14 CET 2018
On 2018-02-13 09:39 AM, Lars Ellenberg wrote:
> On Sun, Feb 11, 2018 at 02:43:27AM -0500, Digimer wrote:
>> On 2018-02-11 01:42 AM, Digimer wrote:
>>
>>
>> Hi all,
>>
>> I've setup a 3-node cluster (config below). Basically, Node 1 & 2 are protocol C and have resource-and-stonith fencing. Node 1 -> 3 and 2 -> 3 are protocol A and fencing is 'dont-care' (it's
>> not part of the cluster and would only ever be promoted manually).
>>
>> When I crash node 2 via 'echo c > /proc/sysrq-trigger', pacemaker detected the faults and so does DRBD. DRBD invokes the fence-handler as expected and all is good. However, I want to test
>> breaking just DRBD, so on node 2 I used 'iptables -I INPUT -p tcp -m tcp --dport 7788:7790 -j DROP' to interrupt DRBD traffic. When this is done, the fence handler is not invoked.
>
> iptables command may need to be changed, to also drop --sport,
> and for good measure, add the same to the OUTPUT chain.
> DRBD connections (are can be) established in both directions.
> You blocked only one direction.
>
> Maybe do it more like this:
> iptables -X drbd
> iptables -N drbd
> iptables -I INPUT -p tcp --dport 7788:7790 -j drbd
> iptables -I INPUT -p tcp --sport 7788:7790 -j drbd
> iptables -I OUTPUT -p tcp --dport 7788:7790 -j drbd
> iptables -I OUTPUT -p tcp --sport 7788:7790 -j drbd
>
> Then toggle:
> break: iptables -I drbd -j DROP
> heal: iptables -F drbd
>
> (beware of typos, I just typed this directly into the email)
>
>> Issue the iptables command on node 2. Journald output;
>>
>> ====
>> -- Logs begin at Sat 2018-02-10 17:51:59 GMT. --
>> Feb 11 06:20:18 m3-a02n01.alteeve.com crmd[2817]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>> Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: PingAck did not arrive in time.
>> Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0: susp-io( no -> fencing)
>
> Goes for suspend-io due to fencing policies, as configured.
>
>> Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02dr01.alteeve.com: Preparing remote state change 1400759070 (primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFA)
>> Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02dr01.alteeve.com: Committing remote state change 1400759070
>> Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: pdsk( DUnknown -> Outdated )
>
> But state changes are relayed through all connected nodes,
> and node02 confirms that it now knows it is Outdated.
>
>> Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0: new current UUID: 769A55B47EB143CD weak: FFFFFFFFFFFFFFFA
>> Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0: susp-io( fencing -> no)
>
> Which means we may resume-io after bumping the data generation uuid tag,
> and don't have to call out to any additional handler "helper" scripts.
>
>> Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: conn( Unconnected -> Connecting )
>> Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: Handshake to peer 1 successful: Agreed network protocol version 112
>
> And since you only blocked one direction,
> we can establish a new one anyways in the other direction.
>
> we do a micro (in this case even: empty) resync:
>
>> Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
>> Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: Began resync as SyncSource (will sync 0 KB [0 bits set]).
>> Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: updated UUIDs 769A55B47EB143CD:0000000000000000:4CF0E17ADD9D1E0E:4161585F99D3837C
>> Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
>> Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established )
>> Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: helper command: /sbin/drbdadm unfence-peer
>> Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: helper command: /sbin/drbdadm unfence-peer exit code 0 (0x0)
>
> And call out to "unfence" just in case.
>
>> The cluster still thinks all is well, too.
>
> Pacemaker "status" shows DRBD in Master or Slave role,
> but cannot show and "disconnected" aspect of DRBD anyways.
>
>> To verify, I can't connect to node 2;
>>
>> ==== [root at m3-a02n01 ~]# telnet m3-a02n02.sn 7788
>
> But node 2 could (and did) still connect to you ;-)
>
>> Note: I down'ed the dr node (node 3) an repeated the test. This time,
>> the fence-handler was invoked. So I assume that DRBD did route through
>> the third node. Impressive!
>
> Yes, "sort of".
>
>> So, is the Protocol C between 1 <-> 2 maintained, when there is an intermediary node that is Protocol A?
>
> "cluster wide state changes" need to propagate via all available
> connections, and need to be relayed.
>
> Data is NOT (yet) relayed.
> One of those items listed in the todo book volume two :-)
Thanks for all this! This is helping me properly understand DRBD 9.
--
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
More information about the drbd-user
mailing list