[DRBD-user] Pacemaker could not switch drbd nodes
Igor Cicimov
igorc at encompasscorporation.com
Tue Mar 27 07:35:45 CEST 2018
Hi,
On Fri, Mar 23, 2018 at 9:01 AM, Lozenkov Sergei <slozenkov at gmail.com>
wrote:
> Hello.
> I have two Debian 9 servers with configured Corosync-Pacemaker-DRBD. All
> work well for month.
> After some servers issues (with reboots) I have situation that pacemaker
> could not switch drbd node with such errors:
>
> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: notice:
> operation_finished: drbd_nfs_stop_0:3667:stderr [ 1: State change
> failed: (-12) Device is held open by someone ]
>
> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: notice:
> operation_finished: drbd_nfs_stop_0:3667:stderr [ Command 'drbdsetup-84
> secondary 1' terminated with exit code 11 ]
>
> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: info:
> log_finished: finished - rsc:drbd_nfs action:stop call_id:47 pid:3667
> exit-code:1 exec-time:20002ms queue-time:0ms
>
> Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com crmd: error:
> process_lrm_event: Result of stop operation for drbd_nfs on
> nfs01-az-eus.tech-corps.com: Timed Out | call=47 key=drbd_nfs_stop_0
> timeout=20000ms
>
> Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com crmd: notice:
> process_lrm_event: nfs01-az-eus.tech-corps.com-drbd_nfs_stop_0:47 [
> 1: State change failed: (-12) Device is held open by someone\nCommand
> 'drbdsetup-84 secondary 1' terminated with exit code 11\n1: State change
> failed: (-12) Device is held open by someone\nCommand 'drbdsetup-84
> secondary 1' terminated with exit code 11\n1: State change failed: (-12)
> Device is held open by someone\nCommand 'drbdsetup-84 secondary 1'
> terminated with exit
>
> I tried to resolve the issue with many googled receipts but all attempts
> were unsuccessful.
> As well I have another two node cluster with exactly the same
> configuration and it works without any issues.
>
> Right now I placed nodes to standby mode and manually raised all services.
> Please, could You help me to analyze and solve the problem?
> Thanks
>
> Here are my configuration files:
> --- CRM CONFIG ---
> crm configure show
> node 171049224: nfs01-az-eus.tech-corps.com \
> attributes standby=off
> node 171049225: nfs02-az-eus.tech-corps.com \
> attributes standby=on
> primitive drbd_nfs ocf:linbit:drbd \
> params drbd_resource=nfs \
> op monitor interval=29s role=Master \
> op monitor interval=31s role=Slave
> primitive fs_nfs Filesystem \
> params device="/dev/drbd1" directory="/data" fstype=ext4 \
> meta is-managed=true
> primitive nfs lsb:nfs-kernel-server \
> op monitor interval=5s
> primitive nmbd lsb:nmbd \
> op monitor interval=5s
> primitive smbd lsb:smbd \
> op monitor interval=5s
> group NFS fs_nfs nfs nmbd smbd
> ms ms_drbd_nfs drbd_nfs \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> order fs-nfs-before-nfs inf: fs_nfs:start nfs:start
> order fs-nfs-before-nmbd inf: fs_nfs:start nmbd:start
> order fs-nfs-before-smbd inf: fs_nfs:start smbd:start
> order ms-drbd-nfs-before-fs-nfs inf: ms_drbd_nfs:promote fs_nfs:start
> colocation ms-drbd-nfs-with-ha inf: ms_drbd_nfs:Master NFS
> order nmbd-before-smbd inf: nmbd:start smbd:start
> property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=1.1.16-94ff4df \
> cluster-infrastructure=corosync \
> cluster-name=debian \
> stonith-enabled=false \
> no-quorum-policy=ignore
>
>
>
> --- DRBD GLOBAL ---
> cat /etc/drbd.d/global_common.conf | grep -v '#'
>
> global {
> usage-count no;
> }
>
> common {
> protocol C;
>
> handlers {
>
> }
>
> startup {
> }
>
> options {
> }
>
> disk {
> }
>
> net {
> }
> }
>
>
> --- DRBD -RESOURCE ---
> cat /etc/drbd.d/nfs.res | grep -v '#'
> resource nfs{
> meta-disk internal;
> device /dev/drbd1;
> syncer {
> verify-alg sha1;
> rate 100M;
> }
>
> net{
> max-buffers 8000;
> max-epoch-size 8000;
> unplug-watermark 16;
> sndbuf-size 0;
> }
>
> disk{
> disk-barrier no;
> disk-flushes no;
> }
>
> on nfs01-az-eus.tech-corps.com{
> disk /dev/sdc1;
> address 10.50.1.8:7789;
> }
>
> on nfs02-az-eus.tech-corps.com{
> disk /dev/sdc1;
> address 10.50.1.9:7789;
> }
> }
>
>
>
>
> --
> Segey L
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
Did you check with fuser what is holding the device/filesystem busy?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180327/8e6c1556/attachment.htm>
More information about the drbd-user
mailing list