[DRBD-user] Pacemaker could not switch drbd nodes
Lozenkov Sergei
slozenkov at gmail.com
Tue Mar 27 07:44:44 CEST 2018
Thank You.
The problem was at the pacemaker level. Solved by article
http://blog.clusterlabs.org/blog/2009/why-wont-the-cluster-start-my-services
(like crm_resource --cleanup --node nagios-clu2 )
2018-03-27 9:35 GMT+04:00 Igor Cicimov <igorc at encompasscorporation.com>:
> Hi,
>
> On Fri, Mar 23, 2018 at 9:01 AM, Lozenkov Sergei <slozenkov at gmail.com>
> wrote:
>
>> Hello.
>> I have two Debian 9 servers with configured Corosync-Pacemaker-DRBD.
>> All work well for month.
>> After some servers issues (with reboots) I have situation that pacemaker
>> could not switch drbd node with such errors:
>>
>> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: notice:
>> operation_finished: drbd_nfs_stop_0:3667:stderr [ 1: State change
>> failed: (-12) Device is held open by someone ]
>>
>> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: notice:
>> operation_finished: drbd_nfs_stop_0:3667:stderr [ Command 'drbdsetup-84
>> secondary 1' terminated with exit code 11 ]
>>
>> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: info:
>> log_finished: finished - rsc:drbd_nfs action:stop call_id:47 pid:3667
>> exit-code:1 exec-time:20002ms queue-time:0ms
>>
>> Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com crmd: error:
>> process_lrm_event: Result of stop operation for drbd_nfs on
>> nfs01-az-eus.tech-corps.com: Timed Out | call=47 key=drbd_nfs_stop_0
>> timeout=20000ms
>>
>> Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com crmd: notice:
>> process_lrm_event: nfs01-az-eus.tech-corps.com-drbd_nfs_stop_0:47 [
>> 1: State change failed: (-12) Device is held open by someone\nCommand
>> 'drbdsetup-84 secondary 1' terminated with exit code 11\n1: State change
>> failed: (-12) Device is held open by someone\nCommand 'drbdsetup-84
>> secondary 1' terminated with exit code 11\n1: State change failed: (-12)
>> Device is held open by someone\nCommand 'drbdsetup-84 secondary 1'
>> terminated with exit
>>
>> I tried to resolve the issue with many googled receipts but all attempts
>> were unsuccessful.
>> As well I have another two node cluster with exactly the same
>> configuration and it works without any issues.
>>
>> Right now I placed nodes to standby mode and manually raised all
>> services.
>> Please, could You help me to analyze and solve the problem?
>> Thanks
>>
>> Here are my configuration files:
>> --- CRM CONFIG ---
>> crm configure show
>> node 171049224: nfs01-az-eus.tech-corps.com \
>> attributes standby=off
>> node 171049225: nfs02-az-eus.tech-corps.com \
>> attributes standby=on
>> primitive drbd_nfs ocf:linbit:drbd \
>> params drbd_resource=nfs \
>> op monitor interval=29s role=Master \
>> op monitor interval=31s role=Slave
>> primitive fs_nfs Filesystem \
>> params device="/dev/drbd1" directory="/data" fstype=ext4 \
>> meta is-managed=true
>> primitive nfs lsb:nfs-kernel-server \
>> op monitor interval=5s
>> primitive nmbd lsb:nmbd \
>> op monitor interval=5s
>> primitive smbd lsb:smbd \
>> op monitor interval=5s
>> group NFS fs_nfs nfs nmbd smbd
>> ms ms_drbd_nfs drbd_nfs \
>> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
>> notify=true
>> order fs-nfs-before-nfs inf: fs_nfs:start nfs:start
>> order fs-nfs-before-nmbd inf: fs_nfs:start nmbd:start
>> order fs-nfs-before-smbd inf: fs_nfs:start smbd:start
>> order ms-drbd-nfs-before-fs-nfs inf: ms_drbd_nfs:promote fs_nfs:start
>> colocation ms-drbd-nfs-with-ha inf: ms_drbd_nfs:Master NFS
>> order nmbd-before-smbd inf: nmbd:start smbd:start
>> property cib-bootstrap-options: \
>> have-watchdog=false \
>> dc-version=1.1.16-94ff4df \
>> cluster-infrastructure=corosync \
>> cluster-name=debian \
>> stonith-enabled=false \
>> no-quorum-policy=ignore
>>
>>
>>
>> --- DRBD GLOBAL ---
>> cat /etc/drbd.d/global_common.conf | grep -v '#'
>>
>> global {
>> usage-count no;
>> }
>>
>> common {
>> protocol C;
>>
>> handlers {
>>
>> }
>>
>> startup {
>> }
>>
>> options {
>> }
>>
>> disk {
>> }
>>
>> net {
>> }
>> }
>>
>>
>> --- DRBD -RESOURCE ---
>> cat /etc/drbd.d/nfs.res | grep -v '#'
>> resource nfs{
>> meta-disk internal;
>> device /dev/drbd1;
>> syncer {
>> verify-alg sha1;
>> rate 100M;
>> }
>>
>> net{
>> max-buffers 8000;
>> max-epoch-size 8000;
>> unplug-watermark 16;
>> sndbuf-size 0;
>> }
>>
>> disk{
>> disk-barrier no;
>> disk-flushes no;
>> }
>>
>> on nfs01-az-eus.tech-corps.com{
>> disk /dev/sdc1;
>> address 10.50.1.8:7789;
>> }
>>
>> on nfs02-az-eus.tech-corps.com{
>> disk /dev/sdc1;
>> address 10.50.1.9:7789;
>> }
>> }
>>
>>
>>
>>
>> --
>> Segey L
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>>
> Did you check with fuser what is holding the device/filesystem busy?
>
>
--
Лозенков Сергей
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180327/5124614f/attachment-0001.htm>
More information about the drbd-user
mailing list