[DRBD-user] Resource Level Fencing Issue: DRBD9 with pacemaker/corosync
Takashi Sogabe
tasogabe at yahoo-corp.jp
Fri Jul 5 04:36:40 CEST 2019
Hello all,
I tested several versions of DRBD kernel drivers. The results are as follows.
- drbd-8.4.11-1
- OK (worked as expected)
- drbd-9.0.15-1
- NG
- drbd-9.0.19-0rc2
- NG
Regards,
--
Takashi Sogabe
-----Original Message-----
From: drbd-user-bounces at lists.linbit.com <drbd-user-bounces at lists.linbit.com> On Behalf Of Takashi Sogabe
Sent: Monday, June 24, 2019 4:27 PM
To: drbd-user at lists.linbit.com
Subject: [DRBD-user] Resource Level Fencing Issue: DRBD9 with pacemaker/corosync
Hello all,
I encountered an issue when I tested behavior of resource level fencing with pacemaker/corosync. It seems something is wrong with handling of fencing events in case of quiescent situation.
I am a newbie of DRBD, so I followed the procedure from LINBIT documentations[1].
I would be great if DRBD guys could let me know mis-configuration if my configuration was wrong. Or, are there any potential issues around there?
App packages:
- OS
- Ubuntu 18.04LTS
- DRBD packages
- drbd-utils
- 9.10.0-1ppa1~bionic1
- drbd-dkms
- 9.0.19~rc1-1ppa1~bionic1
- Pacemaker / Corosync
- pacemaker
- 1.1.18-0ubuntu1.1
- corosync
- 2.4.3-0ubuntu1.1
Network topology:
+--------------+ <- (Redundancy for resource level fencing)
node1 node2
| |
+-------+------+ <- (DRBD data, NFS data)
|
node3
- node1
- DRBD Primary
- NFS Server
- node2
- DRBD Secondary
- NFS Server
- node3
- NFS Client
Description of the Issue:
1. When I disconnected connections between node1 and node2 by using iptables,
fencing was invoked as expected.
# node1 syslog output:
Jun 24 05:51:22 node1 kernel: [17025.778428] drbd r0/0 drbd0 node2: pdsk( UpToDate -> DUnknown ) repl( Established ->Off )
Jun 24 05:51:22 node1 kernel: [17025.810749] drbd r0 node2: helper command: /sbin/drbdadm fence-peer
Jun 24 05:51:23 node1 kernel: [17026.624983] drbd r0/0 drbd0 node2: pdsk( DUnknown -> Outdated )
# node2 syslog output:
Jun 24 05:51:22 node2 kernel: [17020.463472] drbd r0/0 drbd0 node1: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
2. After I recovered connections, corresponding unfencing was not invoked because
state of pdsk (at node1) was transitioned from 'Outdated' to 'UpToDate' directly.
[2] describes 'after-resync-target' is called on a resync target when a node state
changes from Inconsnstent to Consistent when a resync finishes.
So, it seems natural nothing is happen.
# node1 syslog output:
Jun 24 05:53:19 node1 kernel: [17143.117883] drbd r0/0 drbd0 node2: pdsk( Outdated -> UpToDate ) repl( Off -> Established )
# node2 syslog output:
Jun 24 05:53:19 node2 kernel: [17137.789972] drbd r0/0 drbd0 node1: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
Note that If something was written in the DRBD disk after '1.', unfencing was invoked as follows.
1'. # node1 syslog output:
Jun 24 05:56:32 node1 kernel: [17336.114786] drbd r0/0 drbd0 node2: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Jun 24 05:56:32 node1 kernel: [17336.146599] drbd r0 node2: helper command: /sbin/drbdadm fence-peer
Jun 24 05:56:32 node1 kernel: [17336.186206] drbd r0/0 drbd0 node2: pdsk( DUnknown -> Outdated )
# node2 syslog output:
Jun 24 05:56:32 node2 kernel: [17330.741001] drbd r0/0 drbd0 node1: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
2'. # node1 syslog output:
Jun 24 06:01:54 node1 kernel: [17658.236522] drbd r0/0 drbd0 node2: pdsk( Outdated -> Consistent ) repl( Off -> WFBitMapS )
Jun 24 06:01:54 node1 kernel: [17658.237470] drbd r0/0 drbd0 node2: pdsk( Consistent -> Outdated )
Jun 24 06:01:54 node1 kernel: [17658.239750] drbd r0/0 drbd0 node2: pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
Jun 24 06:01:54 node1 kernel: [17658.307340] drbd r0/0 drbd0 node2: pdsk( Inconsistent -> UpToDate ) repl( SyncSource-> Established )
Jun 24 06:01:54 node1 kernel: [17658.307445] drbd r0 node2: helper command: /sbin/drbdadm unfence-peer
# node2 syslog output:
Jun 24 06:01:54 node2 kernel: [17652.870284] drbd r0/0 drbd0 node1: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
Jun 24 06:01:54 node2 kernel: [17652.908634] drbd r0/0 drbd0: disk( Outdated -> Inconsistent )
Jun 24 06:01:54 node2 kernel: [17652.908636] drbd r0/0 drbd0 node1: repl( WFBitMapT -> SyncTarget )
Jun 24 06:01:54 node2 kernel: [17652.918096] drbd r0/0 drbd0: disk( Inconsistent -> UpToDate )
Jun 24 06:01:54 node2 kernel: [17652.918098] drbd r0/0 drbd0 node1: repl( SyncTarget -> Established )
Jun 24 06:01:54 node2 kernel: [17652.918221] drbd r0/0 drbd0 node1: helper command: /sbin/drbdadm after-resync-target
Pacemaker Configuration (crm interactive):
----
sudo crm configure
primitive drbd_nfs ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="29s" role="Master" \
op monitor interval="31s" role="Slave"
ms ms_drbd_nfs drbd_nfs \
meta master-max="1" master-node-max="1" \
clone-max="2" clone-node-max="1" \
notify="true"
primitive fs_nfs ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/r0/0" \
directory="/mnt/drbd" fstype="xfs"
primitive ip_nfs ocf:heartbeat:IPaddr2 \
params ip="192.168.1.100" nic="enp0s8"
primitive nfsd lsb:nfs-kernel-server
group nfs fs_nfs ip_nfs nfsd
colocation nfs_on_drbd \
inf: nfs ms_drbd_nfs:Master
order nfs_after_drbd \
inf: ms_drbd_nfs:promote nfs:start
commit
exit
----
DRBD Configuration (/etc/drbd.d/r0.res):
----
resource r0 {
protocol C;
# for resource level fencing
net {
fencing resource-only;
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";
}
#
device /dev/drbd0;
disk /dev/local/r0;
meta-disk internal;
on node1 {
address 192.168.1.11:7789;
}
on node2 {
address 192.168.1.12:7789;
}
}
----
Corosync Configuration (/etc/corosync/corosync.conf):
----
totem {
version: 2
cluster_name: drbd
secauth: off
transport: udpu
rrp_mode: active
}
nodelist {
node {
ring0_addr: 192.168.1.11
ring1_addr: 172.24.1.11
nodeid: 1
}
node {
ring0_addr: 192.168.1.12
ring1_addr: 172.24.1.12
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
----
[1] "14. Integrating DRBD with Pacemaker clusters",
The DRBD9 and LINSTOR User’s Guide,
https://docs.linbit.com/docs/users-guide-9.0/#ch-pacemaker
[2] Section handlers Parameters
after-resync-target cmd
https://docs.linbit.com/man/v9/drbd-conf-5/#Section_handlers_Parameters
Regards,
--
Takashi Sogabe <tasogabe at yahoo-corp.jp>
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
More information about the drbd-user
mailing list