Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 27 Sep 2016 11:51 pm, "刘丹" <mzlld1988 at 163.com> wrote: > > Failed to send to one or more email server, so send again. > > > > At 2016-09-27 15:47:37, "Nick Wang" <nwang at suse.com > > wrote: >>>> On 2016-9-26 at 19:17, in message ><CACp6BS7W6PyW= 453WkrRFGSZ+f0mqHqL2m9FjSXFicOCQ+wiwA at mail.gmail.com>, Igor >Cicimov < igorc at encompasscorporation.com> wrote: >> On 26 Sep 2016 7:26 pm, "mzlld1988" <mzlld1988 at 163.com> wrote: >> > >> > I apply the attached patch file to scripts/drbd.ocf,then pacemaker can >> start drbd successfully,but only two nodes, the third node's drbd is >> down,is it right? >> Well you didnt say you have 3 nodes. Usually you use pacemaker with 2 nodes >> and drbd. >The patch suppose to help on 3(more) nodes scenario, as long as only one Primary. >Is the 3 nodes DRBD cluster working without pacemaker? And how did you >configure in pacemaker? > Accoring to http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf, I executed the following commands to configure drbd in pacemaker. > [root at pcmk-1 ~]# pcs cluster cib drbd_cfg > [root at pcmk-1 ~]# pcs -f drbd_cfg resource create WebData ocf:linbit:drbd \ > drbd_resource=wwwdata op monitor interval=60s > [root at pcmk-1 ~]# pcs -f drbd_cfg resource master WebDataClone WebData \ > master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \ > notify=true You need clone-max=3 for 3 nodes > [root at pcmk-1 > ~]# pcs cluster cib-push drbd_cfg >> > And another question is ,can pacemaker successfully stop the slave node >> ? My result is pacemaker can't sop the slave node. >> > >Yes, need to check the log on which resource prevent pacemaker to stop. > > Pacemaker can't stop slave node's drbd, I think the reason may be the same as my previous email(see attached file) ,but no one reply that email. > [root at drbd ~]# pcs status > Cluster name: mycluster > Stack: corosync > Current DC: drbd.node103 (version 1.1.15-e174ec8) - partition with quorum > Last updated: Mon Sep 26 04:36:50 2016 Last change: Mon Sep 26 04:36:49 2016 by root via cibadmin on drbd.node101 > > 3 nodes and 2 resources configured > > Online: [ drbd.node101 drbd.node102 drbd.node103 ] > > Full list of resources: > > Master/Slave Set: WebDataClone [WebData] > Masters: [ drbd.node102 ] > Slaves: [ drbd.node101 ] > > Daemon Status: > corosync: active/corosync.service is not a native service, redirecting to /sbin/chkconfig. > Executing /sbin/chkconfig corosync --level=5 > enabled > pacemaker: active/pacemaker.service is not a native service, redirecting to /sbin/chkconfig. > Executing /sbin/chkconfig pacemaker --level=5 > enabled > pcsd: active/enabled > > ------------------------------------------------------------- > Faile to execute ‘pcs cluster stop drbd.node101’ > > =Error message on drbd.node101(secondary node) > Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ Command 'drbdsetup down r0' terminated with exit code 11 ] > Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ r0: State change failed: (-10) State change was refused by peer node ] > Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ additional info from kernel: ] > Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ failed to disconnect ] > Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ Command 'drbdsetup down r0' terminated with exit code 11 ] > Sep 26 04:39:26 drbd crmd[3524]: error: Result of stop operation for WebData on drbd.node101: Timed Out | call=12 key=WebData_stop_0 timeout=100000ms > Sep 26 04:39:26 drbd crmd[3524]: notice: drbd.node101-WebData_stop_0:12 [ r0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed to disconnect\nCommand 'drbdsetup down r0' terminated with exit code 11\nr0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed to disconnect\nCommand 'drbdsetup down r0' terminated with exit code 11\nr0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed t > =Error message on drbd.node102(primary node) > Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Preparing remote state change 3578772780 (primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFFB) > Sep 26 04:39:25 drbd kernel: drbd r0: State change failed: Refusing to be Primary while peer is not outdated > Sep 26 04:39:25 drbd kernel: drbd r0: Failed: susp-io( no -> fencing) > Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Failed: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) > Sep 26 04:39:25 drbd kernel: drbd r0/0 drbd1 drbd.node101: Failed: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) > Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Aborting remote state change 3578772780 >> > I'm looking forward to your answers.Thanks. >> > >> Yes it works with 2 nodes drbd9 configured in standard way not via >> drbdmanager. Haven't tried any other layout. >> >> > > >Best regards, >Nick > > > > > > > > > > > > ---------- Forwarded message ---------- > From: "刘丹" <mzlld1988 at 163.com> > To: drbd-user <drbd-user at lists.linbit.com> > Cc: > Date: Sat, 10 Sep 2016 14:46:01 +0800 (CST) > Subject: Is it normal that we can't directly remove the secondary node when fencing is set? > Hi everyone, > I have a question about removing the secondary node of DRBD9. > When fencing is set, is it normal that we can't remove the secondary node of DRBD9, but the operation is successful of DRBD8.4.6? > Version of DRBD kernel source is the newest version(9.0.4-1).Version of DRBD utils is 8.9.6. > Description: > 3 nodes, one of the nodes is primary,disk state is UpToDate.Fencing is set. > I got an error message 'State change failed: (-7) State change was refused by peer node' when executing the command 'drbdadm down <res-name>' on any of the secondary nodes. > > Analysis: > When executing the down command on one of the secondary nodes. > The secondary node will execute the methods 'change_cluster_wide_state' of drbd_state.c. > change_cluster_wide_state() > { > ... > if (have_peers) { > if (wait_event_timeout(resource->state_wait, > cluster_wide_reply_ready(resource), > twopc_timeout(resource))){-------------①Waiting for peer node to reply, the thread will sleep until the peer node replies. > rv = get_cluster_wide_reply(resource);------------②Get the reply info. > }else{ > } > ... > } > > Process ① > Primary node will execute the following methods. > ..->try_state_change->is_valid_soft_transition->__is_valid_soft_transition > Finally,__is_valid_soft_transition will return error code SS_PRIMARY_NOP。 > > if (peer_device->connection->fencing_policy >= FP_RESOURCE && > !(role[OLD] == R_PRIMARY && repl_state[OLD] < L_ESTABLISHED && !(peer_disk_state[OLD] <= D_OUTDATED)) && > (role[NEW] == R_PRIMARY && repl_state[NEW] < L_ESTABLISHED && !(peer_disk_state[NEW] <= D_OUTDATED))) > return SS_PRIMARY_NOP; > > Primary node will set drbd_packet to P_TWOPC_NO, seconday node will get the reply to set connection status to TWOPC_NO。 > At this time,Process ① will finish. > > Process ② > rv will be set to SS_CW_FAILED_BY_PEER > > ====8.4.6版==== > One is primary, the next one is secondary. > When executing 'drbdadm down <res-name>' on seconday node, the same error message will be recorded in the log file for the first time to change the peer disk to D_UNKNOWN。 > But the command will succeed by changing peer disk to D_OUTDATED for the second time. > > The following code that report the error. > is_valid_state() > { > ... > if (fp >= FP_RESOURCE && > ns.role == R_PRIMARY && ns.conn < C_CONNECTED && ns.pdsk >= D_UNKNOWN①){ > rv = SS_PRIMARY_NOP; > } > ... > } > > After executing the command 'drbdadm down <res-name>' on secondary node, the status of the primary node is: > [root at drbd846 drbd-8.4.6]# cat /proc/drbd > version: 8.4.6 (api:1/proto:86-101) > GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by root at drbd846.node1, 2016-09-08 08:51:45 > 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/Outdated r----- > ns:1048508 nr:0 dw:0 dr:1049236 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 > > The peer disk state is OutDated, not DUnknown. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160928/fd4cd181/attachment.htm>