[DRBD-user] Interesting issue with drbd 9 and fencing

Tue Feb 13 15:39:38 CET 2018

On Sun, Feb 11, 2018 at 02:43:27AM -0500, Digimer wrote:
> On 2018-02-11 01:42 AM, Digimer wrote:
> 
> 
>     Hi all,
> 
>       I've setup a 3-node cluster (config below). Basically, Node 1 & 2 are protocol C and have resource-and-stonith fencing. Node 1 -> 3 and 2 -> 3 are protocol A and fencing is 'dont-care' (it's
>     not part of the cluster and would only ever be promoted manually).
> 
>       When I crash node 2 via 'echo c > /proc/sysrq-trigger', pacemaker detected the faults and so does DRBD. DRBD invokes the fence-handler as expected and all is good. However, I want to test
>     breaking just DRBD, so on node 2 I used 'iptables -I INPUT -p tcp -m tcp --dport 7788:7790 -j DROP' to interrupt DRBD traffic. When this is done, the fence handler is not invoked.

iptables command may need to be changed, to also drop --sport,
and for good measure, add the same to the OUTPUT chain.
DRBD connections (are can be) established in both directions.
You blocked only one direction.

Maybe do it more like this:
iptables -X drbd
iptables -N drbd
iptables -I INPUT -p tcp --dport 7788:7790 -j drbd
iptables -I INPUT -p tcp --sport 7788:7790 -j drbd
iptables -I OUTPUT -p tcp --dport 7788:7790 -j drbd
iptables -I OUTPUT -p tcp --sport 7788:7790 -j drbd

Then toggle:
break: iptables -I drbd -j DROP
 heal: iptables -F drbd

(beware of typos, I just typed this directly into the email)

>       Issue the iptables command on node 2. Journald output;
> 
>     ====
>     -- Logs begin at Sat 2018-02-10 17:51:59 GMT. --
>     Feb 11 06:20:18 m3-a02n01.alteeve.com crmd[2817]:   notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: PingAck did not arrive in time.
>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0: susp-io( no -> fencing)

Goes for suspend-io due to fencing policies, as configured.

>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02dr01.alteeve.com: Preparing remote state change 1400759070 (primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFA)
>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02dr01.alteeve.com: Committing remote state change 1400759070
>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: pdsk( DUnknown -> Outdated )

But state changes are relayed through all connected nodes,
and node02 confirms that it now knows it is Outdated.

>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0: new current UUID: 769A55B47EB143CD weak: FFFFFFFFFFFFFFFA
>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0: susp-io( fencing -> no)

Which means we may resume-io after bumping the data generation uuid tag,
and don't have to call out to any additional handler "helper" scripts.

>     Feb 11 06:28:57 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: conn( Unconnected -> Connecting )
>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: Handshake to peer 1 successful: Agreed network protocol version 112

And since you only blocked one direction,
we can establish a new one anyways in the other direction.

we do a micro (in this case even: empty) resync:

>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: Began resync as SyncSource (will sync 0 KB [0 bits set]).
>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: updated UUIDs 769A55B47EB143CD:0000000000000000:4CF0E17ADD9D1E0E:4161585F99D3837C
>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0/0 drbd0 m3-a02n02.alteeve.com: pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established )
>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: helper command: /sbin/drbdadm unfence-peer
>     Feb 11 06:29:18 m3-a02n01.alteeve.com kernel: drbd srv01-c7_0 m3-a02n02.alteeve.com: helper command: /sbin/drbdadm unfence-peer exit code 0 (0x0)

And call out to "unfence" just in case.

>     The cluster still thinks all is well, too.

Pacemaker "status" shows DRBD in Master or Slave role,
but cannot show and "disconnected" aspect of DRBD anyways.

>     To verify, I can't connect to node 2;
> 
>     ==== [root at m3-a02n01 ~]# telnet m3-a02n02.sn 7788

But node 2 could (and did) still connect to you ;-)

> Note: I down'ed the dr node (node 3) an repeated the test. This time,
> the fence-handler was invoked. So I assume that DRBD did route through
> the third node. Impressive!

Yes, "sort of".

> So, is the Protocol C between 1 <-> 2 maintained, when there is an intermediary node that is Protocol A?

"cluster wide state changes" need to propagate via all available
connections, and need to be relayed.

Data is NOT (yet) relayed.
One of those items listed in the todo book volume two :-)

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed