Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 27 Sep 2016 11:51 pm, "刘丹" <mzlld1988 at 163.com> wrote:
>
> Failed to send to one or more email server, so send again.
>
>
>
> At 2016-09-27 15:47:37, "Nick Wang" <nwang at suse.com
> > wrote: >>>> On 2016-9-26 at 19:17, in message ><CACp6BS7W6PyW=
453WkrRFGSZ+f0mqHqL2m9FjSXFicOCQ+wiwA at mail.gmail.com>, Igor >Cicimov <
igorc at encompasscorporation.com> wrote: >> On 26 Sep 2016 7:26 pm,
"mzlld1988" <mzlld1988 at 163.com> wrote: >> > >> > I apply the attached patch
file to scripts/drbd.ocf,then pacemaker can >> start drbd successfully,but
only two nodes, the third node's drbd is >> down,is it right? >> Well you
didnt say you have 3 nodes. Usually you use pacemaker with 2 nodes >> and
drbd. >The patch suppose to help on 3(more) nodes scenario, as long as only
one Primary. >Is the 3 nodes DRBD cluster working without pacemaker? And
how did you >configure in pacemaker?
> Accoring to http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf, I
executed the following commands to configure drbd in pacemaker.
> [root at pcmk-1 ~]# pcs cluster cib drbd_cfg
> [root at pcmk-1 ~]# pcs -f drbd_cfg resource create WebData ocf:linbit:drbd
\
> drbd_resource=wwwdata op monitor interval=60s
> [root at pcmk-1 ~]# pcs -f drbd_cfg resource master WebDataClone WebData \
> master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
> notify=true
You need clone-max=3 for 3 nodes
> [root at pcmk-1
> ~]# pcs cluster cib-push drbd_cfg >> > And another question is ,can
pacemaker successfully stop the slave node >> ? My result is pacemaker
can't sop the slave node. >> > >Yes, need to check the log on which
resource prevent pacemaker to stop.
>
> Pacemaker can't stop slave node's drbd, I think the reason may be the
same as my previous email(see attached file) ,but no one reply that email.
> [root at drbd ~]# pcs status
> Cluster name: mycluster
> Stack: corosync
> Current DC: drbd.node103 (version 1.1.15-e174ec8) - partition with quorum
> Last updated: Mon Sep 26 04:36:50 2016 Last change: Mon Sep 26
04:36:49 2016 by root via cibadmin on drbd.node101
>
> 3 nodes and 2 resources configured
>
> Online: [ drbd.node101 drbd.node102 drbd.node103 ]
>
> Full list of resources:
>
> Master/Slave Set: WebDataClone [WebData]
> Masters: [ drbd.node102 ]
> Slaves: [ drbd.node101 ]
>
> Daemon Status:
> corosync: active/corosync.service is not a native service, redirecting
to /sbin/chkconfig.
> Executing /sbin/chkconfig corosync --level=5
> enabled
> pacemaker: active/pacemaker.service is not a native service,
redirecting to /sbin/chkconfig.
> Executing /sbin/chkconfig pacemaker --level=5
> enabled
> pcsd: active/enabled
>
> -------------------------------------------------------------
> Faile to execute ‘pcs cluster stop drbd.node101’
>
> =Error message on drbd.node101(secondary node)
> Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [
Command 'drbdsetup down r0' terminated with exit code 11 ]
> Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [
r0: State change failed: (-10) State change was refused by peer node ]
> Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [
additional info from kernel: ]
> Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [
failed to disconnect ]
> Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [
Command 'drbdsetup down r0' terminated with exit code 11 ]
> Sep 26 04:39:26 drbd crmd[3524]: error: Result of stop operation for
WebData on drbd.node101: Timed Out | call=12 key=WebData_stop_0
timeout=100000ms
> Sep 26 04:39:26 drbd crmd[3524]: notice:
drbd.node101-WebData_stop_0:12 [ r0: State change failed: (-10) State
change was refused by peer node\nadditional info from kernel:\nfailed to
disconnect\nCommand 'drbdsetup down r0' terminated with exit code 11\nr0:
State change failed: (-10) State change was refused by peer
node\nadditional info from kernel:\nfailed to disconnect\nCommand
'drbdsetup down r0' terminated with exit code 11\nr0: State change failed:
(-10) State change was refused by peer node\nadditional info from
kernel:\nfailed t
> =Error message on drbd.node102(primary node)
> Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Preparing remote
state change 3578772780 (primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFFB)
> Sep 26 04:39:25 drbd kernel: drbd r0: State change failed: Refusing to
be Primary while peer is not outdated
> Sep 26 04:39:25 drbd kernel: drbd r0: Failed: susp-io( no -> fencing)
> Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Failed: conn(
Connected -> TearDown ) peer( Secondary -> Unknown )
> Sep 26 04:39:25 drbd kernel: drbd r0/0 drbd1 drbd.node101: Failed:
pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
> Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Aborting remote
state change 3578772780 >> > I'm looking forward to your answers.Thanks. >>
> >> Yes it works with 2 nodes drbd9 configured in standard way not via >>
drbdmanager. Haven't tried any other layout. >> >> > > >Best regards, >Nick
>
>
>
>
>
>
>
>
>
>
>
> ---------- Forwarded message ----------
> From: "刘丹" <mzlld1988 at 163.com>
> To: drbd-user <drbd-user at lists.linbit.com>
> Cc:
> Date: Sat, 10 Sep 2016 14:46:01 +0800 (CST)
> Subject: Is it normal that we can't directly remove the secondary node
when fencing is set?
> Hi everyone,
> I have a question about removing the secondary node of DRBD9.
> When fencing is set, is it normal that we can't remove the secondary node
of DRBD9, but the operation is successful of DRBD8.4.6?
> Version of DRBD kernel source is the newest version(9.0.4-1).Version of
DRBD utils is 8.9.6.
> Description:
> 3 nodes, one of the nodes is primary,disk state is UpToDate.Fencing
is set.
> I got an error message 'State change failed: (-7) State change was
refused by peer node' when executing the command 'drbdadm down <res-name>'
on any of the secondary nodes.
>
> Analysis:
> When executing the down command on one of the secondary nodes.
> The secondary node will execute the methods
'change_cluster_wide_state' of drbd_state.c.
> change_cluster_wide_state()
> {
> ...
> if (have_peers) {
> if (wait_event_timeout(resource->state_wait,
> cluster_wide_reply_ready(resource),
>
twopc_timeout(resource))){-------------①Waiting for peer node to reply, the
thread will sleep until the peer node replies.
> rv =
get_cluster_wide_reply(resource);------------②Get the reply info.
> }else{
> }
> ...
> }
>
> Process ①
> Primary node will execute the following methods.
>
..->try_state_change->is_valid_soft_transition->__is_valid_soft_transition
> Finally,__is_valid_soft_transition will return error code
SS_PRIMARY_NOP。
>
> if (peer_device->connection->fencing_policy >= FP_RESOURCE &&
> !(role[OLD] == R_PRIMARY && repl_state[OLD] <
L_ESTABLISHED && !(peer_disk_state[OLD] <= D_OUTDATED)) &&
> (role[NEW] == R_PRIMARY && repl_state[NEW] <
L_ESTABLISHED && !(peer_disk_state[NEW] <= D_OUTDATED)))
> return SS_PRIMARY_NOP;
>
> Primary node will set drbd_packet to P_TWOPC_NO, seconday
node will get the reply to set connection status to TWOPC_NO。
> At this time,Process ① will finish.
>
> Process ②
> rv will be set to SS_CW_FAILED_BY_PEER
>
> ====8.4.6版====
> One is primary, the next one is secondary.
> When executing 'drbdadm down <res-name>' on seconday node, the
same error message will be recorded in the log file for the first time to
change the peer disk to D_UNKNOWN。
> But the command will succeed by changing peer disk to D_OUTDATED
for the second time.
>
> The following code that report the error.
> is_valid_state()
> {
> ...
> if (fp >= FP_RESOURCE &&
> ns.role == R_PRIMARY && ns.conn < C_CONNECTED &&
ns.pdsk >= D_UNKNOWN①){
> rv = SS_PRIMARY_NOP;
> }
> ...
> }
>
> After executing the command 'drbdadm down <res-name>' on
secondary node, the status of the primary node is:
> [root at drbd846 drbd-8.4.6]# cat /proc/drbd
> version: 8.4.6 (api:1/proto:86-101)
> GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by
root at drbd846.node1, 2016-09-08 08:51:45
> 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/Outdated r-----
> ns:1048508 nr:0 dw:0 dr:1049236 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
ep:1 wo:f oos:0
>
> The peer disk state is OutDated, not DUnknown.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160928/fd4cd181/attachment.htm>