[DRBD-user] Pacemaker could not switch drbd nodes
Lozenkov Sergei
slozenkov at gmail.com
Thu Mar 22 23:01:51 CET 2018
Hello.
I have two Debian 9 servers with configured Corosync-Pacemaker-DRBD. All
work well for month.
After some servers issues (with reboots) I have situation that pacemaker
could not switch drbd node with such errors:
Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: notice:
operation_finished: drbd_nfs_stop_0:3667:stderr [ 1: State change
failed: (-12) Device is held open by someone ]
Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: notice:
operation_finished: drbd_nfs_stop_0:3667:stderr [ Command 'drbdsetup-84
secondary 1' terminated with exit code 11 ]
Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: info:
log_finished: finished - rsc:drbd_nfs action:stop call_id:47 pid:3667
exit-code:1 exec-time:20002ms queue-time:0ms
Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com crmd: error:
process_lrm_event: Result of stop operation for drbd_nfs on
nfs01-az-eus.tech-corps.com: Timed Out | call=47 key=drbd_nfs_stop_0
timeout=20000ms
Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com crmd: notice:
process_lrm_event: nfs01-az-eus.tech-corps.com-drbd_nfs_stop_0:47 [ 1:
State change failed: (-12) Device is held open by someone\nCommand
'drbdsetup-84 secondary 1' terminated with exit code 11\n1: State change
failed: (-12) Device is held open by someone\nCommand 'drbdsetup-84
secondary 1' terminated with exit code 11\n1: State change failed: (-12)
Device is held open by someone\nCommand 'drbdsetup-84 secondary 1'
terminated with exit
I tried to resolve the issue with many googled receipts but all attempts
were unsuccessful.
As well I have another two node cluster with exactly the same configuration
and it works without any issues.
Right now I placed nodes to standby mode and manually raised all services.
Please, could You help me to analyze and solve the problem?
Thanks
Here are my configuration files:
--- CRM CONFIG ---
crm configure show
node 171049224: nfs01-az-eus.tech-corps.com \
attributes standby=off
node 171049225: nfs02-az-eus.tech-corps.com \
attributes standby=on
primitive drbd_nfs ocf:linbit:drbd \
params drbd_resource=nfs \
op monitor interval=29s role=Master \
op monitor interval=31s role=Slave
primitive fs_nfs Filesystem \
params device="/dev/drbd1" directory="/data" fstype=ext4 \
meta is-managed=true
primitive nfs lsb:nfs-kernel-server \
op monitor interval=5s
primitive nmbd lsb:nmbd \
op monitor interval=5s
primitive smbd lsb:smbd \
op monitor interval=5s
group NFS fs_nfs nfs nmbd smbd
ms ms_drbd_nfs drbd_nfs \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
order fs-nfs-before-nfs inf: fs_nfs:start nfs:start
order fs-nfs-before-nmbd inf: fs_nfs:start nmbd:start
order fs-nfs-before-smbd inf: fs_nfs:start smbd:start
order ms-drbd-nfs-before-fs-nfs inf: ms_drbd_nfs:promote fs_nfs:start
colocation ms-drbd-nfs-with-ha inf: ms_drbd_nfs:Master NFS
order nmbd-before-smbd inf: nmbd:start smbd:start
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.16-94ff4df \
cluster-infrastructure=corosync \
cluster-name=debian \
stonith-enabled=false \
no-quorum-policy=ignore
--- DRBD GLOBAL ---
cat /etc/drbd.d/global_common.conf | grep -v '#'
global {
usage-count no;
}
common {
protocol C;
handlers {
}
startup {
}
options {
}
disk {
}
net {
}
}
--- DRBD -RESOURCE ---
cat /etc/drbd.d/nfs.res | grep -v '#'
resource nfs{
meta-disk internal;
device /dev/drbd1;
syncer {
verify-alg sha1;
rate 100M;
}
net{
max-buffers 8000;
max-epoch-size 8000;
unplug-watermark 16;
sndbuf-size 0;
}
disk{
disk-barrier no;
disk-flushes no;
}
on nfs01-az-eus.tech-corps.com{
disk /dev/sdc1;
address 10.50.1.8:7789;
}
on nfs02-az-eus.tech-corps.com{
disk /dev/sdc1;
address 10.50.1.9:7789;
}
}
--
Segey L
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180323/004a914e/attachment.htm>
More information about the drbd-user
mailing list