<p dir="ltr"></p>

<p dir="ltr">On 27 Sep 2016 11:51 pm, &quot;刘丹&quot; &lt;<a href="mailto:mzlld1988@163.com">mzlld1988@163.com</a>&gt; wrote:<br>

&gt;<br>

&gt; Failed to send to one or more email server, so send again.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; At 2016-09-27 15:47:37, &quot;Nick Wang&quot; &lt;<a href="mailto:nwang@suse.com">nwang@suse.com</a><br>

&gt; &gt; wrote: &gt;&gt;&gt;&gt; On 2016-9-26 at 19:17, in message &gt;&lt;CACp6BS7W6PyW=<a href="mailto:453WkrRFGSZ%2Bf0mqHqL2m9FjSXFicOCQ%2BwiwA@mail.gmail.com">453WkrRFGSZ+f0mqHqL2m9FjSXFicOCQ+wiwA@mail.gmail.com</a>&gt;, Igor &gt;Cicimov &lt;<a href="mailto:igorc@encompasscorporation.com">igorc@encompasscorporation.com</a>&gt; wrote: &gt;&gt; On 26 Sep 2016 7:26 pm, &quot;mzlld1988&quot; &lt;<a href="mailto:mzlld1988@163.com">mzlld1988@163.com</a>&gt; wrote: &gt;&gt; &gt; &gt;&gt; &gt; I apply the attached patch file to scripts／drbd.ocf，then pacemaker can &gt;&gt; start drbd successfully，but only two nodes， the third node&#39;s drbd is &gt;&gt; down，is it right? &gt;&gt; Well you didnt say you have 3 nodes. Usually you use pacemaker with 2 nodes &gt;&gt; and drbd. &gt;The patch suppose to help on 3(more) nodes scenario, as long as only one Primary. &gt;Is the 3 nodes DRBD cluster working without pacemaker? And how did you &gt;configure in pacemaker? <br>

&gt; Accoring to <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>, I executed the following commands to configure drbd in pacemaker.<br>

&gt;  [root@pcmk-1 ~]# pcs cluster cib drbd_cfg<br>

&gt;  [root@pcmk-1 ~]# pcs -f drbd_cfg resource create WebData ocf:linbit:drbd \<br>

&gt;      drbd_resource=wwwdata op monitor interval=60s<br>

&gt;  [root@pcmk-1 ~]# pcs -f drbd_cfg resource master WebDataClone WebData \<br>

&gt;      master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \<br>

&gt;      notify=true<br>

You need clone-max=3 for 3 nodes</p>

<p dir="ltr">&gt;  [root@pcmk-1<br>

&gt;  ~]# pcs cluster cib-push drbd_cfg &gt;&gt; &gt; And another question is ，can pacemaker successfully stop the slave node &gt;&gt; ? My result is pacemaker can&#39;t sop the slave node. &gt;&gt; &gt; &gt;Yes, need to check the log on which resource prevent pacemaker to stop. <br>

&gt;<br>

&gt; Pacemaker can&#39;t stop slave node&#39;s drbd, I think the reason may be the same as my previous email(see attached file) ,but no one reply that email.<br>

&gt; [root@drbd ~]# pcs status<br>

&gt;  Cluster name: mycluster<br>

&gt;  Stack: corosync<br>

&gt;  Current DC: drbd.node103 (version 1.1.15-e174ec8) - partition with quorum<br>

&gt;  Last updated: Mon Sep 26 04:36:50 2016          Last change: Mon Sep 26 04:36:49 2016 by root via cibadmin on drbd.node101<br>

&gt;<br>

&gt; 3 nodes and 2 resources configured<br>

&gt;<br>

&gt; Online: [ drbd.node101 drbd.node102 drbd.node103 ]<br>

&gt;<br>

&gt; Full list of resources:<br>

&gt;<br>

&gt;  Master/Slave Set: WebDataClone [WebData]<br>

&gt;    Masters: [ drbd.node102 ]<br>

&gt;    Slaves: [ drbd.node101 ]<br>

&gt;<br>

&gt; Daemon Status:<br>

&gt;    corosync: active/corosync.service is not a native service, redirecting to /sbin/chkconfig.<br>

&gt;  Executing /sbin/chkconfig corosync --level=5<br>

&gt;  enabled<br>

&gt;    pacemaker: active/pacemaker.service is not a native service, redirecting to /sbin/chkconfig.<br>

&gt;  Executing /sbin/chkconfig pacemaker --level=5<br>

&gt;  enabled<br>

&gt;    pcsd: active/enabled<br>

&gt;<br>

&gt; -------------------------------------------------------------<br>

&gt; Faile to execute ‘pcs cluster stop drbd.node101’<br>

&gt;<br>

&gt; =Error message on drbd.node101(secondary node)<br>

&gt;   Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ Command &#39;drbdsetup down r0&#39; terminated with exit code 11 ]<br>

&gt;   Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ r0: State change failed: (-10) State change was refused by peer node ]<br>

&gt;   Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ additional info from kernel: ]<br>

&gt;   Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ failed to disconnect ]<br>

&gt;   Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ Command &#39;drbdsetup down r0&#39; terminated with exit code 11 ]<br>

&gt;   Sep 26 04:39:26 drbd crmd[3524]:   error: Result of stop operation for WebData on drbd.node101: Timed Out | call=12 key=WebData_stop_0 timeout=100000ms<br>

&gt;   Sep 26 04:39:26 drbd crmd[3524]:  notice: drbd.node101-WebData_stop_0:12 [ r0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed to disconnect\nCommand &#39;drbdsetup down r0&#39; terminated with exit code 11\nr0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed to disconnect\nCommand &#39;drbdsetup down r0&#39; terminated with exit code 11\nr0: State change failed: (-10) State change was refused by peer node\nadditional info from kernel:\nfailed t<br>

&gt;  =Error message on drbd.node102(primary node)<br>

&gt;   Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Preparing remote state change 3578772780 (primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFFB)<br>

&gt;   Sep 26 04:39:25 drbd kernel: drbd r0: State change failed: Refusing to be Primary while peer is not outdated<br>

&gt;   Sep 26 04:39:25 drbd kernel: drbd r0: Failed: susp-io( no -&gt; fencing)<br>

&gt;   Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Failed: conn( Connected -&gt; TearDown ) peer( Secondary -&gt; Unknown )<br>

&gt;   Sep 26 04:39:25 drbd kernel: drbd r0/0 drbd1 drbd.node101: Failed: pdsk( UpToDate -&gt; DUnknown ) repl( Established -&gt; Off )<br>

&gt;   Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Aborting remote state change 3578772780 &gt;&gt; &gt; I&#39;m looking forward to your answers.Thanks. &gt;&gt; &gt; &gt;&gt; Yes it works with 2 nodes drbd9 configured in standard way not via &gt;&gt; drbdmanager. Haven&#39;t tried any other layout. &gt;&gt; &gt;&gt; &gt; &gt; &gt;Best regards, &gt;Nick <br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;  <br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;  <br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; ---------- Forwarded message ----------<br>

&gt; From: &quot;刘丹&quot; &lt;<a href="mailto:mzlld1988@163.com">mzlld1988@163.com</a>&gt;<br>

&gt; To: drbd-user &lt;<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a>&gt;<br>

&gt; Cc: <br>

&gt; Date: Sat, 10 Sep 2016 14:46:01 +0800 (CST)<br>

&gt; Subject: Is it normal that we can&#39;t directly remove the secondary node when fencing is set?<br>

&gt; Hi everyone,<br>

&gt; I have a question about removing the secondary node of DRBD9.<br>

&gt; When fencing is set, is it normal that we can&#39;t remove the secondary node of DRBD9, but the operation is successful of DRBD8.4.6?<br>

&gt; Version of DRBD kernel source is the newest version(9.0.4-1).Version of DRBD utils is 8.9.6.<br>

&gt; Description:<br>

&gt;     3 nodes, one of the nodes is primary,disk state is UpToDate.Fencing is set.<br>

&gt;     I got an error message &#39;State change failed: (-7) State change was refused by peer node&#39; when executing the command &#39;drbdadm down &lt;res-name&gt;&#39; on any of the secondary nodes.<br>

&gt;<br>

&gt; Analysis:<br>

&gt;     When executing the down command on one of the secondary nodes.<br>

&gt;     The secondary node will execute the methods &#39;change_cluster_wide_state&#39; of drbd_state.c.<br>

&gt;     change_cluster_wide_state()<br>

&gt;     {<br>

&gt;         ...<br>

&gt;         if (have_peers) {<br>

&gt;                 if (wait_event_timeout(resource-&gt;state_wait,<br>

&gt;                                cluster_wide_reply_ready(resource),<br>

&gt;                                twopc_timeout(resource))){-------------①Waiting for peer node to reply, the thread will sleep until the peer node replies.<br>

&gt;                     rv = get_cluster_wide_reply(resource);------------②Get the reply info.        <br>

&gt;                 }else{<br>

&gt;                 }<br>

&gt;         ...<br>

&gt;     }<br>

&gt;<br>

&gt;     Process ①<br>

&gt;         Primary node will execute the following methods.<br>

&gt;             ..-&gt;try_state_change-&gt;is_valid_soft_transition-&gt;__is_valid_soft_transition<br>

&gt;             Finally,__is_valid_soft_transition will return error code SS_PRIMARY_NOP。<br>

&gt;<br>

&gt;             if (peer_device-&gt;connection-&gt;fencing_policy &gt;= FP_RESOURCE &amp;&amp;<br>

&gt;                 !(role[OLD] == R_PRIMARY &amp;&amp; repl_state[OLD] &lt; L_ESTABLISHED &amp;&amp; !(peer_disk_state[OLD] &lt;= D_OUTDATED)) &amp;&amp;<br>

&gt;                  (role[NEW] == R_PRIMARY &amp;&amp; repl_state[NEW] &lt; L_ESTABLISHED &amp;&amp; !(peer_disk_state[NEW] &lt;= D_OUTDATED)))<br>

&gt;                    return SS_PRIMARY_NOP;<br>

&gt;<br>

&gt;             Primary node will set drbd_packet to P_TWOPC_NO, seconday node will get the reply to set connection status to TWOPC_NO。<br>

&gt;             At this time,Process ① will finish. <br>

&gt;<br>

&gt;     Process ②<br>

&gt;            rv will be set to SS_CW_FAILED_BY_PEER<br>

&gt;         <br>

&gt;     ====8.4.6版====<br>

&gt;         One is primary, the next one is secondary.<br>

&gt;         When executing &#39;drbdadm down &lt;res-name&gt;&#39; on seconday node, the same error message will be recorded in the log file for the first time to change the peer disk to D_UNKNOWN。<br>

&gt;         But the command will succeed by changing peer disk to D_OUTDATED for the second time.<br>

&gt;         <br>

&gt;         The following code that report the error.<br>

&gt;         is_valid_state()<br>

&gt;         {<br>

&gt;             ...<br>

&gt;             if (fp &gt;= FP_RESOURCE &amp;&amp;<br>

&gt;                      ns.role == R_PRIMARY &amp;&amp; ns.conn &lt; C_CONNECTED &amp;&amp; ns.pdsk &gt;= D_UNKNOWN①){<br>

&gt;                         rv = SS_PRIMARY_NOP;<br>

&gt;                      }<br>

&gt;             ...<br>

&gt;         }<br>

&gt;        <br>

&gt;         After executing the command &#39;drbdadm down &lt;res-name&gt;&#39; on secondary node, the status of the primary node is:<br>

&gt;         [root@drbd846 drbd-8.4.6]# cat /proc/drbd<br>

&gt;         version: 8.4.6 (api:1/proto:86-101)<br>

&gt;         GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by root@drbd846.node1, 2016-09-08 08:51:45<br>

&gt;          0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/Outdated   r-----<br>

&gt;             ns:1048508 nr:0 dw:0 dr:1049236 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0<br>

&gt;<br>

&gt;         The peer disk state is OutDated, not DUnknown.<br>

&gt;<br>

&gt;<br>

&gt;  <br>

&gt;<br>

&gt;</p>