[DRBD-user] Interesting issue with drbd 9 and fencing

Tue Feb 13 18:29:14 CET 2018

On 2018-02-13 09:39 AM, Lars Ellenberg wrote:
> On Sun, Feb 11, 2018 at 02:43:27AM -0500, Digimer wrote:
>> On 2018-02-11 01:42 AM, Digimer wrote:
>>
>>
>>     Hi all,
>>
>>       I've setup a 3-node cluster (config below). Basically, Node 1 & 2 are protocol C and have resource-and-stonith fencing. Node 1 -> 3 and 2 -> 3 are protocol A and fencing is 'dont-care' (it's
>>     not part of the cluster and would only ever be promoted manually).
>>
>>       When I crash node 2 via 'echo c > /proc/sysrq-trigger', pacemaker detected the faults and so does DRBD. DRBD invokes the fence-handler as expected and all is good. However, I want to test
>>     breaking just DRBD, so on node 2 I used 'iptables -I INPUT -p tcp -m tcp --dport 7788:7790 -j DROP' to interrupt DRBD traffic. When this is done, the fence handler is not invoked.
> 
> iptables command may need to be changed, to also drop --sport,
> and for good measure, add the same to the OUTPUT chain.
> DRBD connections (are can be) established in both directions.
> You blocked only one direction.
> 
> Maybe do it more like this:
> iptables -X drbd
> iptables -N drbd
> iptables -I INPUT -p tcp --dport 7788:7790 -j drbd
> iptables -I INPUT -p tcp --sport 7788:7790 -j drbd
> iptables -I OUTPUT -p tcp --dport 7788:7790 -j drbd
> iptables -I OUTPUT -p tcp --sport 7788:7790 -j drbd
> 
> Then toggle:
> break: iptables -I drbd -j DROP
>  heal: iptables -F drbd
> 
> (beware of typos, I just typed this directly into the email)
> 
>>       Issue the iptables command on node 2. Journald output;
>>
>>     ====
>>     -- Logs begin at Sat 2018-02-10 17:51:59 GMT. --
>>     Feb 11 06:20:18 m3-a02n01.alteeve.com crmd[2817]:   notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: PingAck did not arrive in time.
>>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0: susp-io( no -> fencing)
> 
> Goes for suspend-io due to fencing policies, as configured.
> 
>>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02dr01.alteeve.com: Preparing remote state change 1400759070 (primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFA)
>>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02dr01.alteeve.com: Committing remote state change 1400759070
>>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: pdsk( DUnknown -> Outdated )
> 
> But state changes are relayed through all connected nodes,
> and node02 confirms that it now knows it is Outdated.
> 
>>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0: new current UUID: 769A55B47EB143CD weak: FFFFFFFFFFFFFFFA
>>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0: susp-io( fencing -> no)
> 
> Which means we may resume-io after bumping the data generation uuid tag,
> and don't have to call out to any additional handler "helper" scripts.
> 
>>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: conn( Unconnected -> Connecting )
>>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: Handshake to peer 1 successful: Agreed network protocol version 112
> 
> And since you only blocked one direction,
> we can establish a new one anyways in the other direction.
> 
> we do a micro (in this case even: empty) resync:
> 
>>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
>>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: Began resync as SyncSource (will sync 0 KB [0 bits set]).
>>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: updated UUIDs 769A55B47EB143CD:0000000000000000:4CF0E17ADD9D1E0E:4161585F99D3837C
>>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
>>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established )
>>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: helper command: /sbin/drbdadm unfence-peer
>>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: helper command: /sbin/drbdadm unfence-peer exit code 0 (0x0)
> 
> And call out to "unfence" just in case.
> 
>>     The cluster still thinks all is well, too.
> 
> Pacemaker "status" shows DRBD in Master or Slave role,
> but cannot show and "disconnected" aspect of DRBD anyways.
> 
>>     To verify, I can't connect to node 2;
>>
>>     ==== [root at m3-a02n01 ~]# telnet m3-a02n02.sn 7788
> 
> But node 2 could (and did) still connect to you ;-)
> 
>> Note: I down'ed the dr node (node 3) an repeated the test. This time,
>> the fence-handler was invoked. So I assume that DRBD did route through
>> the third node. Impressive!
> 
> Yes, "sort of".
> 
>> So, is the Protocol C between 1 <-> 2 maintained, when there is an intermediary node that is Protocol A?
> 
> "cluster wide state changes" need to propagate via all available
> connections, and need to be relayed.
> 
> Data is NOT (yet) relayed.
> One of those items listed in the todo book volume two :-)

Thanks for all this! This is helping me properly understand DRBD 9.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould