[DRBD-user] Pacemaker could not switch drbd nodes

Thu Mar 22 23:01:51 CET 2018

Hello.
I have two Debian 9 servers with configured  Corosync-Pacemaker-DRBD. All
work well for month.
After some servers issues (with reboots) I have situation that pacemaker
could not switch drbd node with such errors:

Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com       lrmd:   notice:
operation_finished:     drbd_nfs_stop_0:3667:stderr [ 1: State change
failed: (-12) Device is held open by someone ]

Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com       lrmd:   notice:
operation_finished:     drbd_nfs_stop_0:3667:stderr [ Command 'drbdsetup-84
secondary 1' terminated with exit code 11 ]

Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com       lrmd:     info:
log_finished:   finished - rsc:drbd_nfs action:stop call_id:47 pid:3667
exit-code:1 exec-time:20002ms queue-time:0ms

Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com       crmd:    error:
process_lrm_event:      Result of stop operation for drbd_nfs on
nfs01-az-eus.tech-corps.com: Timed Out | call=47 key=drbd_nfs_stop_0
timeout=20000ms

Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com       crmd:   notice:
process_lrm_event:      nfs01-az-eus.tech-corps.com-drbd_nfs_stop_0:47 [ 1:
State change failed: (-12) Device is held open by someone\nCommand
'drbdsetup-84 secondary 1' terminated with exit code 11\n1: State change
failed: (-12) Device is held open by someone\nCommand 'drbdsetup-84
secondary 1' terminated with exit code 11\n1: State change failed: (-12)
Device is held open by someone\nCommand 'drbdsetup-84 secondary 1'
terminated with exit

I tried to resolve the issue with many googled receipts but all attempts
were unsuccessful.
As well I have another two node cluster with exactly the same configuration
and it works without any issues.

Right now I placed nodes to standby mode and manually raised all services.
Please, could You help me to analyze and solve the problem?
Thanks

Here are my configuration files:
--- CRM CONFIG ---
crm configure show
node 171049224: nfs01-az-eus.tech-corps.com \
        attributes standby=off
node 171049225: nfs02-az-eus.tech-corps.com \
        attributes standby=on
primitive drbd_nfs ocf:linbit:drbd \
        params drbd_resource=nfs \
        op monitor interval=29s role=Master \
        op monitor interval=31s role=Slave
primitive fs_nfs Filesystem \
        params device="/dev/drbd1" directory="/data" fstype=ext4 \
        meta is-managed=true
primitive nfs lsb:nfs-kernel-server \
        op monitor interval=5s
primitive nmbd lsb:nmbd \
        op monitor interval=5s
primitive smbd lsb:smbd \
        op monitor interval=5s
group NFS fs_nfs nfs nmbd smbd
ms ms_drbd_nfs drbd_nfs \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
order fs-nfs-before-nfs inf: fs_nfs:start nfs:start
order fs-nfs-before-nmbd inf: fs_nfs:start nmbd:start
order fs-nfs-before-smbd inf: fs_nfs:start smbd:start
order ms-drbd-nfs-before-fs-nfs inf: ms_drbd_nfs:promote fs_nfs:start
colocation ms-drbd-nfs-with-ha inf: ms_drbd_nfs:Master NFS
order nmbd-before-smbd inf: nmbd:start smbd:start
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.16-94ff4df \
        cluster-infrastructure=corosync \
        cluster-name=debian \
        stonith-enabled=false \
        no-quorum-policy=ignore

--- DRBD GLOBAL ---
cat /etc/drbd.d/global_common.conf | grep -v '#'

global {
        usage-count no;
}

common {
        protocol C;

        handlers {

        }

        startup {
        }

        options {
        }

        disk {
        }

        net {
        }
}

--- DRBD -RESOURCE ---
cat /etc/drbd.d/nfs.res | grep -v '#'
resource nfs{
  meta-disk internal;
  device /dev/drbd1;
  syncer {
    verify-alg sha1;
        rate 100M;
  }

  net{
    max-buffers 8000;
    max-epoch-size 8000;
    unplug-watermark 16;
    sndbuf-size 0;
  }

  disk{
    disk-barrier no;
    disk-flushes no;
  }

  on nfs01-az-eus.tech-corps.com{
    disk /dev/sdc1;
    address 10.50.1.8:7789;
  }

  on nfs02-az-eus.tech-corps.com{
    disk /dev/sdc1;
    address 10.50.1.9:7789;
  }
}

-- 
Segey L
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180323/004a914e/attachment.htm>