<p dir="ltr"></p>
<p dir="ltr">On 27 Sep 2016 11:51 pm, "刘丹" <<a href="mailto:mzlld1988@163.com">mzlld1988@163.com</a>> wrote:<br>
><br>
> Failed to send to one or more email server, so send again.<br>
><br>
><br>
><br>
> At 2016-09-27 15:47:37, "Nick Wang" <<a href="mailto:nwang@suse.com">nwang@suse.com</a><br>
> > wrote: >>>> On 2016-9-26 at 19:17, in message ><CACp6BS7W6PyW=<a href="mailto:453WkrRFGSZ%2Bf0mqHqL2m9FjSXFicOCQ%2BwiwA@mail.gmail.com">453WkrRFGSZ+f0mqHqL2m9FjSXFicOCQ+wiwA@mail.gmail.com</a>>, Igor >Cicimov <<a href="mailto:igorc@encompasscorporation.com">igorc@encompasscorporation.com</a>> wrote: >> On 26 Sep 2016 7:26 pm, "mzlld1988" <<a href="mailto:mzlld1988@163.com">mzlld1988@163.com</a>> wrote: >> > >> > I apply the attached patch file to scripts/drbd.ocf,then pacemaker can >> start drbd successfully,but only two nodes, the third node's drbd is >> down,is it right? >> Well you didnt say you have 3 nodes. Usually you use pacemaker with 2 nodes >> and drbd. >The patch suppose to help on 3(more) nodes scenario, as long as only one Primary. >Is the 3 nodes DRBD cluster working without pacemaker? And how did you >configure in pacemaker? <br>
> Accoring to <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>, I executed the following commands to configure drbd in pacemaker.<br>
> [root@pcmk-1 ~]# pcs cluster cib drbd_cfg<br>
> [root@pcmk-1 ~]# pcs -f drbd_cfg resource create WebData ocf:linbit:drbd \<br>
> drbd_resource=wwwdata op monitor interval=60s<br>
> [root@pcmk-1 ~]# pcs -f drbd_cfg resource master WebDataClone WebData \<br>
> master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \<br>
> notify=true<br>
You need clone-max=3 for 3 nodes</p>
<p dir="ltr">> [root@pcmk-1<br>
> ~]# pcs cluster cib-push drbd_cfg >> > And another question is ,can pacemaker successfully stop the slave node >> ? My result is pacemaker can't sop the slave node. >> > >Yes, need to check the log on which resource prevent pacemaker to stop. <br>
><br>
> Pacemaker can't stop slave node's drbd, I think the reason may be the same as my previous email(see attached file) ,but no one reply that email.<br>
> [root@drbd ~]# pcs status<br>
> Cluster name: mycluster<br>
> Stack: corosync<br>
> Current DC: drbd.node103 (version 1.1.15-e174ec8) - partition with quorum<br>
> Last updated: Mon Sep 26 04:36:50 2016 Last change: Mon Sep 26 04:36:49 2016 by root via cibadmin on drbd.node101<br>
><br>
> 3 nodes and 2 resources configured<br>
><br>
> Online: [ drbd.node101 drbd.node102 drbd.node103 ]<br>
><br>
> Full list of resources:<br>
><br>
> Master/Slave Set: WebDataClone [WebData]<br>
> Masters: [ drbd.node102 ]<br>
> Slaves: [ drbd.node101 ]<br>
><br>
> Daemon Status:<br>
> corosync: active/corosync.service is not a native service, redirecting to /sbin/chkconfig.<br>
> Executing /sbin/chkconfig corosync --level=5<br>
> enabled<br>
> pacemaker: active/pacemaker.service is not a native service, redirecting to /sbin/chkconfig.<br>
> Executing /sbin/chkconfig pacemaker --level=5<br>
> enabled<br>
> pcsd: active/enabled<br>
><br>
> -------------------------------------------------------------<br>
> Faile to execute ‘pcs cluster stop drbd.node101’<br>
><br>
> =Error message on drbd.node101(secondary node)<br>
> Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ Command 'drbdsetup down r0' terminated with exit code 11 ]<br>
> Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ r0: State change failed: (-10) State change was refused by peer node ]<br>
> Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ additional info from kernel: ]<br>
> Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ failed to disconnect ]<br>
> Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ Command 'drbdsetup down r0' terminated with exit code 11 ]<br>
> Sep 26 04:39:26 drbd crmd[3524]: error: Result of stop operation for WebData on drbd.node101: Timed Out | call=12 key=WebData_stop_0 timeout=100000ms<br>
> Sep 26 04:39:26 drbd crmd[3524]: notice: drbd.node101-WebData_stop_0:12 [ r0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed to disconnect\nCommand 'drbdsetup down r0' terminated with exit code 11\nr0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed to disconnect\nCommand 'drbdsetup down r0' terminated with exit code 11\nr0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed t<br>
> =Error message on drbd.node102(primary node)<br>
> Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Preparing remote state change 3578772780 (primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFFB)<br>
> Sep 26 04:39:25 drbd kernel: drbd r0: State change failed: Refusing to be Primary while peer is not outdated<br>
> Sep 26 04:39:25 drbd kernel: drbd r0: Failed: susp-io( no -> fencing)<br>
> Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Failed: conn( Connected -> TearDown ) peer( Secondary -> Unknown )<br>
> Sep 26 04:39:25 drbd kernel: drbd r0/0 drbd1 drbd.node101: Failed: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )<br>
> Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Aborting remote state change 3578772780 >> > I'm looking forward to your answers.Thanks. >> > >> Yes it works with 2 nodes drbd9 configured in standard way not via >> drbdmanager. Haven't tried any other layout. >> >> > > >Best regards, >Nick <br>
><br>
><br>
><br>
> <br>
><br>
><br>
><br>
> <br>
><br>
><br>
><br>
> ---------- Forwarded message ----------<br>
> From: "刘丹" <<a href="mailto:mzlld1988@163.com">mzlld1988@163.com</a>><br>
> To: drbd-user <<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a>><br>
> Cc: <br>
> Date: Sat, 10 Sep 2016 14:46:01 +0800 (CST)<br>
> Subject: Is it normal that we can't directly remove the secondary node when fencing is set?<br>
> Hi everyone,<br>
> I have a question about removing the secondary node of DRBD9.<br>
> When fencing is set, is it normal that we can't remove the secondary node of DRBD9, but the operation is successful of DRBD8.4.6?<br>
> Version of DRBD kernel source is the newest version(9.0.4-1).Version of DRBD utils is 8.9.6.<br>
> Description:<br>
> 3 nodes, one of the nodes is primary,disk state is UpToDate.Fencing is set.<br>
> I got an error message 'State change failed: (-7) State change was refused by peer node' when executing the command 'drbdadm down <res-name>' on any of the secondary nodes.<br>
><br>
> Analysis:<br>
> When executing the down command on one of the secondary nodes.<br>
> The secondary node will execute the methods 'change_cluster_wide_state' of drbd_state.c.<br>
> change_cluster_wide_state()<br>
> {<br>
> ...<br>
> if (have_peers) {<br>
> if (wait_event_timeout(resource->state_wait,<br>
> cluster_wide_reply_ready(resource),<br>
> twopc_timeout(resource))){-------------①Waiting for peer node to reply, the thread will sleep until the peer node replies.<br>
> rv = get_cluster_wide_reply(resource);------------②Get the reply info. <br>
> }else{<br>
> }<br>
> ...<br>
> }<br>
><br>
> Process ①<br>
> Primary node will execute the following methods.<br>
> ..->try_state_change->is_valid_soft_transition->__is_valid_soft_transition<br>
> Finally,__is_valid_soft_transition will return error code SS_PRIMARY_NOP。<br>
><br>
> if (peer_device->connection->fencing_policy >= FP_RESOURCE &&<br>
> !(role[OLD] == R_PRIMARY && repl_state[OLD] < L_ESTABLISHED && !(peer_disk_state[OLD] <= D_OUTDATED)) &&<br>
> (role[NEW] == R_PRIMARY && repl_state[NEW] < L_ESTABLISHED && !(peer_disk_state[NEW] <= D_OUTDATED)))<br>
> return SS_PRIMARY_NOP;<br>
><br>
> Primary node will set drbd_packet to P_TWOPC_NO, seconday node will get the reply to set connection status to TWOPC_NO。<br>
> At this time,Process ① will finish. <br>
><br>
> Process ②<br>
> rv will be set to SS_CW_FAILED_BY_PEER<br>
> <br>
> ====8.4.6版====<br>
> One is primary, the next one is secondary.<br>
> When executing 'drbdadm down <res-name>' on seconday node, the same error message will be recorded in the log file for the first time to change the peer disk to D_UNKNOWN。<br>
> But the command will succeed by changing peer disk to D_OUTDATED for the second time.<br>
> <br>
> The following code that report the error.<br>
> is_valid_state()<br>
> {<br>
> ...<br>
> if (fp >= FP_RESOURCE &&<br>
> ns.role == R_PRIMARY && ns.conn < C_CONNECTED && ns.pdsk >= D_UNKNOWN①){<br>
> rv = SS_PRIMARY_NOP;<br>
> }<br>
> ...<br>
> }<br>
> <br>
> After executing the command 'drbdadm down <res-name>' on secondary node, the status of the primary node is:<br>
> [root@drbd846 drbd-8.4.6]# cat /proc/drbd<br>
> version: 8.4.6 (api:1/proto:86-101)<br>
> GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by root@drbd846.node1, 2016-09-08 08:51:45<br>
> 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/Outdated r-----<br>
> ns:1048508 nr:0 dw:0 dr:1049236 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0<br>
><br>
> The peer disk state is OutDated, not DUnknown.<br>
><br>
><br>
> <br>
><br>
></p>