[DRBD-user] Pacemaker could not switch drbd nodes

Tue Mar 27 07:44:44 CEST 2018

Thank You.
The problem was at the  pacemaker level. Solved by article
http://blog.clusterlabs.org/blog/2009/why-wont-the-cluster-start-my-services
(like crm_resource --cleanup --node nagios-clu2 )

2018-03-27 9:35 GMT+04:00 Igor Cicimov <igorc at encompasscorporation.com>:

> Hi,
>
> On Fri, Mar 23, 2018 at 9:01 AM, Lozenkov Sergei <slozenkov at gmail.com>
> wrote:
>
>> Hello.
>> I have two Debian 9 servers with configured  Corosync-Pacemaker-DRBD.
>> All work well for month.
>> After some servers issues (with reboots) I have situation that pacemaker
>> could not switch drbd node with such errors:
>>
>> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com       lrmd:   notice:
>> operation_finished:     drbd_nfs_stop_0:3667:stderr [ 1: State change
>> failed: (-12) Device is held open by someone ]
>>
>> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com       lrmd:   notice:
>> operation_finished:     drbd_nfs_stop_0:3667:stderr [ Command 'drbdsetup-84
>> secondary 1' terminated with exit code 11 ]
>>
>> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com       lrmd:     info:
>> log_finished:   finished - rsc:drbd_nfs action:stop call_id:47 pid:3667
>> exit-code:1 exec-time:20002ms queue-time:0ms
>>
>> Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com       crmd:    error:
>> process_lrm_event:      Result of stop operation for drbd_nfs on
>> nfs01-az-eus.tech-corps.com: Timed Out | call=47 key=drbd_nfs_stop_0
>> timeout=20000ms
>>
>> Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com       crmd:   notice:
>> process_lrm_event:      nfs01-az-eus.tech-corps.com-drbd_nfs_stop_0:47 [
>> 1: State change failed: (-12) Device is held open by someone\nCommand
>> 'drbdsetup-84 secondary 1' terminated with exit code 11\n1: State change
>> failed: (-12) Device is held open by someone\nCommand 'drbdsetup-84
>> secondary 1' terminated with exit code 11\n1: State change failed: (-12)
>> Device is held open by someone\nCommand 'drbdsetup-84 secondary 1'
>> terminated with exit
>>
>> I tried to resolve the issue with many googled receipts but all attempts
>> were unsuccessful.
>> As well I have another two node cluster with exactly the same
>> configuration and it works without any issues.
>>
>> Right now I placed nodes to standby mode and manually raised all
>> services.
>> Please, could You help me to analyze and solve the problem?
>> Thanks
>>
>> Here are my configuration files:
>> --- CRM CONFIG ---
>> crm configure show
>> node 171049224: nfs01-az-eus.tech-corps.com \
>>         attributes standby=off
>> node 171049225: nfs02-az-eus.tech-corps.com \
>>         attributes standby=on
>> primitive drbd_nfs ocf:linbit:drbd \
>>         params drbd_resource=nfs \
>>         op monitor interval=29s role=Master \
>>         op monitor interval=31s role=Slave
>> primitive fs_nfs Filesystem \
>>         params device="/dev/drbd1" directory="/data" fstype=ext4 \
>>         meta is-managed=true
>> primitive nfs lsb:nfs-kernel-server \
>>         op monitor interval=5s
>> primitive nmbd lsb:nmbd \
>>         op monitor interval=5s
>> primitive smbd lsb:smbd \
>>         op monitor interval=5s
>> group NFS fs_nfs nfs nmbd smbd
>> ms ms_drbd_nfs drbd_nfs \
>>         meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
>> notify=true
>> order fs-nfs-before-nfs inf: fs_nfs:start nfs:start
>> order fs-nfs-before-nmbd inf: fs_nfs:start nmbd:start
>> order fs-nfs-before-smbd inf: fs_nfs:start smbd:start
>> order ms-drbd-nfs-before-fs-nfs inf: ms_drbd_nfs:promote fs_nfs:start
>> colocation ms-drbd-nfs-with-ha inf: ms_drbd_nfs:Master NFS
>> order nmbd-before-smbd inf: nmbd:start smbd:start
>> property cib-bootstrap-options: \
>>         have-watchdog=false \
>>         dc-version=1.1.16-94ff4df \
>>         cluster-infrastructure=corosync \
>>         cluster-name=debian \
>>         stonith-enabled=false \
>>         no-quorum-policy=ignore
>>
>>
>>
>> --- DRBD GLOBAL ---
>> cat /etc/drbd.d/global_common.conf | grep -v '#'
>>
>> global {
>>         usage-count no;
>> }
>>
>> common {
>>         protocol C;
>>
>>         handlers {
>>
>>         }
>>
>>         startup {
>>         }
>>
>>         options {
>>         }
>>
>>         disk {
>>         }
>>
>>         net {
>>         }
>> }
>>
>>
>> --- DRBD -RESOURCE ---
>> cat /etc/drbd.d/nfs.res | grep -v '#'
>> resource nfs{
>>   meta-disk internal;
>>   device /dev/drbd1;
>>   syncer {
>>     verify-alg sha1;
>>         rate 100M;
>>   }
>>
>>   net{
>>     max-buffers 8000;
>>     max-epoch-size 8000;
>>     unplug-watermark 16;
>>     sndbuf-size 0;
>>   }
>>
>>   disk{
>>     disk-barrier no;
>>     disk-flushes no;
>>   }
>>
>>   on nfs01-az-eus.tech-corps.com{
>>     disk /dev/sdc1;
>>     address 10.50.1.8:7789;
>>   }
>>
>>   on nfs02-az-eus.tech-corps.com{
>>     disk /dev/sdc1;
>>     address 10.50.1.9:7789;
>>   }
>> }
>>
>>
>>
>>
>> --
>> Segey L
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>>
> Did you check with fuser what is holding the device/filesystem busy?
>
>

-- 

Лозенков Сергей
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180327/5124614f/attachment-0001.htm>