[DRBD-user] Resource Level Fencing Issue: DRBD9 with pacemaker/corosync

Takashi Sogabe tasogabe at yahoo-corp.jp
Fri Jul 5 04:36:40 CEST 2019


Hello all,

I tested several versions of DRBD kernel drivers. The results are as follows.

- drbd-8.4.11-1
  - OK (worked as expected)
- drbd-9.0.15-1
  - NG
- drbd-9.0.19-0rc2
  - NG

Regards,
--
Takashi Sogabe


-----Original Message-----
From: drbd-user-bounces at lists.linbit.com <drbd-user-bounces at lists.linbit.com> On Behalf Of Takashi Sogabe
Sent: Monday, June 24, 2019 4:27 PM
To: drbd-user at lists.linbit.com
Subject: [DRBD-user] Resource Level Fencing Issue: DRBD9 with pacemaker/corosync

Hello all,

I encountered an issue when I tested behavior of resource level fencing with pacemaker/corosync. It seems something is wrong with handling of fencing events in case of quiescent situation.

I am a newbie of DRBD, so I followed the procedure from LINBIT documentations[1].

I would be great if DRBD guys could let me know mis-configuration if my configuration was wrong. Or, are there any potential issues around there?

App packages:
- OS
  - Ubuntu 18.04LTS
- DRBD packages
  - drbd-utils
    - 9.10.0-1ppa1~bionic1
  - drbd-dkms
    - 9.0.19~rc1-1ppa1~bionic1
- Pacemaker / Corosync
  - pacemaker
    - 1.1.18-0ubuntu1.1
  - corosync
    - 2.4.3-0ubuntu1.1

Network topology:

    +--------------+   <- (Redundancy for resource level fencing)
  node1         node2
    |              |
    +-------+------+   <- (DRBD data, NFS data)
            |
          node3

- node1
  - DRBD Primary
  - NFS Server
- node2
  - DRBD Secondary
  - NFS Server
- node3
  - NFS Client


Description of the Issue:
1. When I disconnected connections between node1 and node2 by using iptables,
   fencing was invoked as expected.

   # node1 syslog output:
   Jun 24 05:51:22 node1 kernel: [17025.778428] drbd r0/0 drbd0 node2: pdsk( UpToDate -> DUnknown ) repl( Established ->Off )
   Jun 24 05:51:22 node1 kernel: [17025.810749] drbd r0 node2: helper command: /sbin/drbdadm fence-peer
   Jun 24 05:51:23 node1 kernel: [17026.624983] drbd r0/0 drbd0 node2: pdsk( DUnknown -> Outdated )
   # node2 syslog output:
   Jun 24 05:51:22 node2 kernel: [17020.463472] drbd r0/0 drbd0 node1: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )

2. After I recovered connections, corresponding unfencing was not invoked because
   state of pdsk (at node1) was transitioned from 'Outdated' to 'UpToDate' directly.
   [2] describes 'after-resync-target' is called on a resync target when a node state
   changes from Inconsnstent to Consistent when a resync finishes.
   So, it seems natural nothing is happen.

   # node1 syslog output:
   Jun 24 05:53:19 node1 kernel: [17143.117883] drbd r0/0 drbd0 node2: pdsk( Outdated -> UpToDate ) repl( Off -> Established )
   # node2 syslog output:
   Jun 24 05:53:19 node2 kernel: [17137.789972] drbd r0/0 drbd0 node1: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )

Note that If something was written in the DRBD disk after '1.', unfencing was invoked as follows.

1'. # node1 syslog output:
    Jun 24 05:56:32 node1 kernel: [17336.114786] drbd r0/0 drbd0 node2: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
    Jun 24 05:56:32 node1 kernel: [17336.146599] drbd r0 node2: helper command: /sbin/drbdadm fence-peer
    Jun 24 05:56:32 node1 kernel: [17336.186206] drbd r0/0 drbd0 node2: pdsk( DUnknown -> Outdated )
    # node2 syslog output:
    Jun 24 05:56:32 node2 kernel: [17330.741001] drbd r0/0 drbd0 node1: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )

2'. # node1 syslog output:
    Jun 24 06:01:54 node1 kernel: [17658.236522] drbd r0/0 drbd0 node2: pdsk( Outdated -> Consistent ) repl( Off -> WFBitMapS )
    Jun 24 06:01:54 node1 kernel: [17658.237470] drbd r0/0 drbd0 node2: pdsk( Consistent -> Outdated )
    Jun 24 06:01:54 node1 kernel: [17658.239750] drbd r0/0 drbd0 node2: pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
    Jun 24 06:01:54 node1 kernel: [17658.307340] drbd r0/0 drbd0 node2: pdsk( Inconsistent -> UpToDate ) repl( SyncSource-> Established )
    Jun 24 06:01:54 node1 kernel: [17658.307445] drbd r0 node2: helper command: /sbin/drbdadm unfence-peer
    # node2 syslog output:
    Jun 24 06:01:54 node2 kernel: [17652.870284] drbd r0/0 drbd0 node1: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
    Jun 24 06:01:54 node2 kernel: [17652.908634] drbd r0/0 drbd0: disk( Outdated -> Inconsistent )
    Jun 24 06:01:54 node2 kernel: [17652.908636] drbd r0/0 drbd0 node1: repl( WFBitMapT -> SyncTarget )
    Jun 24 06:01:54 node2 kernel: [17652.918096] drbd r0/0 drbd0: disk( Inconsistent -> UpToDate )
    Jun 24 06:01:54 node2 kernel: [17652.918098] drbd r0/0 drbd0 node1: repl( SyncTarget -> Established )
    Jun 24 06:01:54 node2 kernel: [17652.918221] drbd r0/0 drbd0 node1: helper command: /sbin/drbdadm after-resync-target

Pacemaker Configuration (crm interactive):
----
sudo crm configure
  primitive drbd_nfs ocf:linbit:drbd \
    params drbd_resource="r0" \
    op monitor interval="29s" role="Master" \
    op monitor interval="31s" role="Slave"
  ms ms_drbd_nfs drbd_nfs \
    meta master-max="1" master-node-max="1" \
    clone-max="2" clone-node-max="1" \
    notify="true"
  primitive fs_nfs ocf:heartbeat:Filesystem \
    params device="/dev/drbd/by-res/r0/0" \
    directory="/mnt/drbd" fstype="xfs"
  primitive ip_nfs ocf:heartbeat:IPaddr2 \
    params ip="192.168.1.100" nic="enp0s8"
  primitive nfsd lsb:nfs-kernel-server
  group nfs fs_nfs ip_nfs nfsd
  colocation nfs_on_drbd \
    inf: nfs ms_drbd_nfs:Master
  order nfs_after_drbd \
    inf: ms_drbd_nfs:promote nfs:start
  commit
  exit
----

DRBD Configuration (/etc/drbd.d/r0.res):
----
resource r0 {
        protocol C;

        # for resource level fencing
        net {
                fencing resource-only;
        }
        handlers {
                fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
                after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";
        }
        #

        device /dev/drbd0;
        disk /dev/local/r0;
        meta-disk internal;
        on node1 {
                address 192.168.1.11:7789;
        }
        on node2 {
                address 192.168.1.12:7789;
        }
}
----

Corosync Configuration (/etc/corosync/corosync.conf):
----
totem {
    version: 2
    cluster_name: drbd
    secauth: off
    transport: udpu
    rrp_mode: active
}

nodelist {
    node {
        ring0_addr: 192.168.1.11
        ring1_addr: 172.24.1.11
        nodeid: 1
    }

    node {
        ring0_addr: 192.168.1.12
        ring1_addr: 172.24.1.12
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}
----

[1] "14. Integrating DRBD with Pacemaker clusters",
    The DRBD9 and LINSTOR User’s Guide,
    https://docs.linbit.com/docs/users-guide-9.0/#ch-pacemaker
[2] Section handlers Parameters
    after-resync-target cmd
    https://docs.linbit.com/man/v9/drbd-conf-5/#Section_handlers_Parameters

Regards,
--
Takashi Sogabe <tasogabe at yahoo-corp.jp>
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


More information about the drbd-user mailing list