<div dir="ltr">I can do start my cluster but.<div><br></div><div><div style="font-family:arial,sans-serif;font-size:13px">doing the failover tests, I reboot the master node and the slave node not assume the services </div>

<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">

in the log i have it:</div><div style="font-family:arial,sans-serif;font-size:13px">Oct 16 09:20:23 ha-slave <span class="">drbd</span>[7125]: ERROR: postgresql: Exit code 11<br></div><div style="font-family:arial,sans-serif;font-size:13px">

<div>Oct 16 09:20:23 ha-slave <span class="">drbd</span>[7125]: ERROR: postgresql: Command output:</div><div>Oct 16 09:20:24 ha-slave crmd: [1033]: ERROR: process_lrm_event: LRM operation drbd_postgresql:1_promote_0 (89) Timed Out (timeout=20000ms)</div>

<div>Oct 16 09:20:29 ha-slave <span class="">drbd</span>[7585]: ERROR: postgresql: Called drbdadm -c /etc/<span class="">drbd</span>.conf primary postgresql</div></div><div style="font-family:arial,sans-serif;font-size:13px">

<br></div><div style="font-family:arial,sans-serif;font-size:13px">I belive the slave node, not assume the primary on <span class="">drbd</span> </div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">

On Mon, Oct 14, 2013 at 12:43 PM, Lars Ellenberg <span dir="ltr">&lt;<a href="mailto:lars.ellenberg@linbit.com" target="_blank">lars.ellenberg@linbit.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5">On Mon, Oct 14, 2013 at 11:15:31AM -0300, Thomaz Luiz Santos wrote:<br>

&gt; &gt; You should *not* start DRBD from the init script.<br>

&gt; &gt;  # chkconfig drbd off<br>

&gt;<br>

&gt; *** OK remove start on the boot<br>

&gt;<br>

&gt;<br>

&gt; &gt; You should *NOT* configure &quot;no-disk-drain&quot;.<br>

&gt; &gt; It is likely to corrupt your data.<br>

&gt; ** OK removed the disk drain from postgresql.res<br>

&gt;<br>

&gt; # cat postgresql.res<br>

&gt; resource postgresql {<br>

&gt;   startup {<br>

&gt;     wfc-timeout 15;<br>

&gt;     degr-wfc-timeout 60;<br>

&gt;   }<br>

&gt;<br>

&gt;   syncer {<br>

&gt;     rate 150M;<br>

&gt;     verify-alg md5;<br>

&gt;   }<br>

&gt;<br>

&gt;   on ha-master {<br>

&gt;      device /dev/drbd0;<br>

&gt;      disk /dev/sdb1;<br>

&gt;      address <a href="http://172.70.65.210:7788" target="_blank">172.70.65.210:7788</a>;<br>

&gt;      meta-disk internal;<br>

&gt;   }<br>

&gt;<br>

&gt;   on ha-slave {<br>

&gt;      device /dev/drbd0;<br>

&gt;      disk /dev/sdb1;<br>

&gt;      address <a href="http://172.70.65.220:7788" target="_blank">172.70.65.220:7788</a>;<br>

&gt;      meta-disk internal;<br>

&gt;  }<br>

&gt;<br>

&gt; }<br>

&gt;<br>

&gt;<br>

&gt; &gt; You should configure monitoring ops for DRBD.<br>

&gt; &gt; One each for Master and Slave role, with different intervals.<br>

&gt;<br>

&gt; ** how i can do that ??<br>

&gt;<br>

&gt; from:<br>

&gt; <a href="http://www.drbd.org/users-guide-9.0/s-pacemaker-crm-drbd-backed-service.html" target="_blank">http://www.drbd.org/users-guide-9.0/s-pacemaker-crm-drbd-backed-service.html</a><br>

&gt;<br>

&gt; crm(live)configure# primitive drbd_mysql ocf:linbit:drbd \<br>

&gt;                     params drbd_resource=&quot;mysql&quot; \<br>

&gt;                     op monitor interval=&quot;29s&quot; role=&quot;Master&quot; \<br>

&gt;                     op monitor interval=&quot;31s&quot; role=&quot;Slave&quot;<br>

&gt;<br>

&gt;<br>

&gt; &gt; You probably need to &quot;crm resource cleanup ...&quot; a bit.<br>

&gt; ** remake the crm configure<br>

&gt;<br>

&gt; crm(live)# configure<br>

&gt; crm(live)configure# show<br>

&gt; node ha-master<br>

&gt; node ha-slave<br>

&gt; primitive drbd_postgresql ocf:heartbeat:drbd \<br>

&gt;         params drbd_resource=&quot;postgresql&quot; \<br>

&gt;         op monitor interval=&quot;29s&quot; role=&quot;Master&quot; \<br>

&gt;         op monitor interval=&quot;31s&quot; role=&quot;Slave&quot;<br>

&gt; primitive fs_postgresql ocf:heartbeat:Filesystem \<br>

&gt;         params device=&quot;/dev/drbd0&quot; directory=&quot;/mnt&quot; fstype=&quot;ext4&quot;<br>

&gt; primitive postgresqld lsb:postgresql<br>

&gt; primitive vip_cluster ocf:heartbeat:IPaddr2 \<br>

&gt;         params ip=&quot;<a href="tel:172.70.65.200" value="+551727065200">172.70.65.200</a>&quot; nic=&quot;eth0:1&quot;<br>

&gt; group postgresql fs_postgresql vip_cluster postgresqld<br>

&gt; ms ms_drbd_postgresql drbd_postgresql \<br>

&gt;         meta master-max=&quot;1&quot; master-node-max=&quot;1&quot; clone-max=&quot;2&quot;<br>

&gt; clone-node-max=&quot;1&quot; notify=&quot;true&quot;<br>

&gt; colocation postgresql_on_drbd inf: postgresql ms_drbd_postgresql:Master<br>

&gt; order postgresql_after_drbd inf: ms_drbd_postgresql:promote postgresql:start<br>

&gt; property $id=&quot;cib-bootstrap-options&quot; \<br>

&gt;         dc-version=&quot;1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c&quot; \<br>

&gt;         cluster-infrastructure=&quot;openais&quot; \<br>

&gt;         expected-quorum-votes=&quot;2&quot; \<br>

&gt;         no-quorum-policy=&quot;ignore&quot; \<br>

&gt;         stonith-enabled=&quot;false&quot;<br>

&gt; rsc_defaults $id=&quot;rsc-options&quot; \<br>

&gt;         resource-stickiness=&quot;100&quot;<br>

&gt;<br>

&gt;<br>

&gt; &gt; You may need to manually remove fencing constraints (if DRBD finished<br>

&gt; &gt; the resync when no pacemaker was running yet, it would not have been<br>

&gt; &gt; able to remove it from its handler).<br>

&gt;<br>

&gt; ** how i do that ???<br>

&gt; ** only in the ha-master ??<br>

&gt; # drbdadmin create-md postgresql<br>

&gt; # drbdadmin up postgresql<br>

&gt; # drbdadmin -- --overwrite-data-of-peer primary postgresql<br>

<br>

</div></div>Nope.<br>

cat /proc/drbd<br>

crm configure show<br>

<br>

   drbd UpToDate/UpToDate, but<br>

   fencing contraints present?<br>

   --&gt; delete those fencing constraints<br>

   crm configure<br>

       delete id-of-those-contraints...<br>

<div class="im"><br>

&gt; You may need to *read the logs*.<br>

&gt; The answer will be in there.<br>

&gt;<br>

&gt; &gt; You have:<br>

&gt; &gt; Failed actions:<br>

&gt; &gt;     drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1,<br>

&gt; &gt; status=complete): unknown error<br>

&gt;<br>

&gt; &gt; So start looking for that, and see what it complains about.<br>

&gt;<br>

&gt; # cat /var/log/syslog | grep drbd_postgresql<br>

&gt; *** Syslog<br>

&gt;<br>

&gt; Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op: drbd_postgresql:1_last_failure_0 on ha-slave returned 1 (unknown error) instead of the expected value: 0 (ok)<br>

&gt; Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: Processing failed op drbd_postgresql:1_last_failure_0 on ha-slave: unknown error (1)<br>

&gt; Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op: drbd_postgresql:0_last_failure_0 on ha-master returned 1 (unknown error) instead of the expected value: 0 (ok)<br>

&gt; Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: Processing failed op drbd_postgresql:0_last_failure_0 on ha-master: unknown error (1)<br>

&gt; Oct 14 11:10:08 ha-master pengine: [786]: info: clone_print:  Master/Slave Set: ms_drbd_postgresql [drbd_postgresql]<br>

&gt; Oct 14 11:10:08 ha-master pengine: [786]: info: short_print:      Stopped: [ drbd_postgresql:0 drbd_postgresql:1 ]<br>

&gt; Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount: ms_drbd_postgresql has failed INFINITY times on ha-slave<br>

&gt; Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness: Forcing ms_drbd_postgresql away from ha-slave after 1000000 failures (max=1000000)<br>

<br>

</div>Boring :-(<br>

These logs just say that &quot;there was a fatal error before,<br>

and that&#39;s why I won&#39;t even try again&quot;.<br>

And they will tell you this repeatedly, everytime<br>

the policy engine runs.<br>

<br>

You need to &quot;crm resource cleanup ...&quot;,<br>

then watch the actual start attempt fail.<br>

<br>

(Or look in the logs for the last real start attempt,<br>

 and its failure)<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

--<br>

: Lars Ellenberg<br>

: LINBIT | Your Way to High Availability<br>

: DRBD/HA support and consulting <a href="http://www.linbit.com" target="_blank">http://www.linbit.com</a><br>

<br>

DRBDŽ and LINBITŽ are registered trademarks of LINBIT, Austria.<br>

__<br>

please don&#39;t Cc me, but send to list   --   I&#39;m subscribed<br>

_______________________________________________<br>

drbd-user mailing list<br>

<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>

<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>------------------------------<br>Thomaz Luiz Santos<br>Linux User: #359356<br><a href="http://thomaz.santos.googlepages.com/">http://thomaz.santos.googlepages.com/</a>

</div>