[DRBD-user] DRBD9&Pacemaker

Tue Sep 27 14:01:10 CEST 2016

At 2016-09-27 15:47:37, "Nick Wang" <nwang at suse.com> wrote:
>>>> On 2016-9-26 at 19:17, in message
><CACp6BS7W6PyW=453WkrRFGSZ+f0mqHqL2m9FjSXFicOCQ+wiwA at mail.gmail.com>, Igor
>Cicimov <igorc at encompasscorporation.com> wrote:
>> On 26 Sep 2016 7:26 pm, "mzlld1988" <mzlld1988 at 163.com> wrote:
>> >
>> > I apply the attached patch file to scripts／drbd.ocf，then pacemaker can
>> start drbd successfully，but only two nodes， the third node's drbd is
>> down，is it right?
>> Well you didnt say you have 3 nodes. Usually you use pacemaker with 2 nodes
>> and drbd.
>The patch suppose to help on 3(more) nodes scenario, as long as only one Primary.
>Is the 3 nodes DRBD cluster working without pacemaker? And how did you 
>configure in pacemaker?
Accoring to  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf, I executed the following commands to configure drbd in pacemaker.
 [root at pcmk-1 ~]# pcs cluster cib drbd_cfg
 [root at pcmk-1 ~]# pcs -f drbd_cfg resource create WebData ocf:linbit:drbd \
     drbd_resource=wwwdata op monitor interval=60s
 [root at pcmk-1 ~]# pcs -f drbd_cfg resource master WebDataClone WebData \
     master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
     notify=true
 [root at pcmk-1 ~]# pcs cluster cib-push drbd_cfg
>> > And another question is ，can pacemaker  successfully stop the slave node
>> ? My result is pacemaker can't sop the slave node.
>> >
>Yes, need to check the log on which resource prevent pacemaker to stop.

Pacemaker can't stop slave node's drbd, I think the reason may be the same as my previous email(see attached file) ,but no one reply that email.
[root at drbd ~]# pcs status
 Cluster name: mycluster
 Stack: corosync
 Current DC: drbd.node103 (version 1.1.15-e174ec8) - partition with quorum
 Last updated: Mon Sep 26 04:36:50 2016          Last change: Mon Sep 26 04:36:49 2016 by root via cibadmin on drbd.node101

3 nodes and 2 resources configured

Online: [ drbd.node101 drbd.node102 drbd.node103 ]

Full list of resources:

 Master/Slave Set: WebDataClone [WebData]
   Masters: [ drbd.node102 ]
   Slaves: [ drbd.node101 ]

Daemon Status:
   corosync: active/corosync.service is not a native service, redirecting to /sbin/chkconfig.
 Executing /sbin/chkconfig corosync --level=5
 enabled
   pacemaker: active/pacemaker.service is not a native service, redirecting to /sbin/chkconfig.
 Executing /sbin/chkconfig pacemaker --level=5
 enabled
   pcsd: active/enabled

-------------------------------------------------------------
Faile to execute ‘pcs cluster stop drbd.node101’

=Error message on drbd.node101(secondary node)
  Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ Command 'drbdsetup down r0' terminated with exit code 11 ]
  Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ r0: State change failed: (-10) State change was refused by peer node ]
  Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ additional info from kernel: ]
  Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ failed to disconnect ]
  Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ Command 'drbdsetup down r0' terminated with exit code 11 ]
  Sep 26 04:39:26 drbd crmd[3524]:   error: Result of stop operation for WebData on drbd.node101: Timed Out | call=12 key=WebData_stop_0 timeout=100000ms
  Sep 26 04:39:26 drbd crmd[3524]:  notice: drbd.node101-WebData_stop_0:12 [ r0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed to disconnect\nCommand 'drbdsetup down r0' terminated with exit code 11\nr0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed to disconnect\nCommand 'drbdsetup down r0' terminated with exit code 11\nr0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed t
 =Error message on drbd.node102(primary node)
  Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Preparing remote state change 3578772780 (primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFFB)
  Sep 26 04:39:25 drbd kernel: drbd r0: State change failed: Refusing to be Primary while peer is not outdated
  Sep 26 04:39:25 drbd kernel: drbd r0: Failed: susp-io( no -> fencing)
  Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Failed: conn( Connected -> TearDown ) peer( Secondary -> Unknown )
  Sep 26 04:39:25 drbd kernel: drbd r0/0 drbd1 drbd.node101: Failed: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
  Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Aborting remote state change 3578772780
>> > I'm looking forward to your answers.Thanks.
>> >
>> Yes it works with 2  nodes drbd9 configured in standard way not via
>> drbdmanager. Haven't tried any other layout.
>> 
>> >
>
>Best regards,
>Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160927/628f1619/attachment.htm>
-------------- next part --------------
An embedded message was scrubbed...
From: =?GBK?B?wfW1pA==?= <mzlld1988 at 163.com>
Subject: Is it normal that we can't directly remove the secondary node when
 fencing is set?
Date: Sat, 10 Sep 2016 14:46:01 +0800 (CST)
Size: 15557
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160927/628f1619/attachment.eml>