<div dir="ltr">I can do start my cluster but.<div><br></div><div><div style="font-family:arial,sans-serif;font-size:13px">doing the failover tests, I reboot the master node and the slave node not assume the services </div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">
in the log i have it:</div><div style="font-family:arial,sans-serif;font-size:13px">Oct 16 09:20:23 ha-slave <span class="">drbd</span>[7125]: ERROR: postgresql: Exit code 11<br></div><div style="font-family:arial,sans-serif;font-size:13px">
<div>Oct 16 09:20:23 ha-slave <span class="">drbd</span>[7125]: ERROR: postgresql: Command output:</div><div>Oct 16 09:20:24 ha-slave crmd: [1033]: ERROR: process_lrm_event: LRM operation drbd_postgresql:1_promote_0 (89) Timed Out (timeout=20000ms)</div>
<div>Oct 16 09:20:29 ha-slave <span class="">drbd</span>[7585]: ERROR: postgresql: Called drbdadm -c /etc/<span class="">drbd</span>.conf primary postgresql</div></div><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">I belive the slave node, not assume the primary on <span class="">drbd</span> </div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Mon, Oct 14, 2013 at 12:43 PM, Lars Ellenberg <span dir="ltr">&lt;<a href="mailto:lars.ellenberg@linbit.com" target="_blank">lars.ellenberg@linbit.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">On Mon, Oct 14, 2013 at 11:15:31AM -0300, Thomaz Luiz Santos wrote:<br>
&gt; &gt; You should *not* start DRBD from the init script.<br>
&gt; &gt;  # chkconfig drbd off<br>
&gt;<br>
&gt; *** OK remove start on the boot<br>
&gt;<br>
&gt;<br>
&gt; &gt; You should *NOT* configure &quot;no-disk-drain&quot;.<br>
&gt; &gt; It is likely to corrupt your data.<br>
&gt; ** OK removed the disk drain from postgresql.res<br>
&gt;<br>
&gt; # cat postgresql.res<br>
&gt; resource postgresql {<br>
&gt;   startup {<br>
&gt;     wfc-timeout 15;<br>
&gt;     degr-wfc-timeout 60;<br>
&gt;   }<br>
&gt;<br>
&gt;   syncer {<br>
&gt;     rate 150M;<br>
&gt;     verify-alg md5;<br>
&gt;   }<br>
&gt;<br>
&gt;   on ha-master {<br>
&gt;      device /dev/drbd0;<br>
&gt;      disk /dev/sdb1;<br>
&gt;      address <a href="http://172.70.65.210:7788" target="_blank">172.70.65.210:7788</a>;<br>
&gt;      meta-disk internal;<br>
&gt;   }<br>
&gt;<br>
&gt;   on ha-slave {<br>
&gt;      device /dev/drbd0;<br>
&gt;      disk /dev/sdb1;<br>
&gt;      address <a href="http://172.70.65.220:7788" target="_blank">172.70.65.220:7788</a>;<br>
&gt;      meta-disk internal;<br>
&gt;  }<br>
&gt;<br>
&gt; }<br>
&gt;<br>
&gt;<br>
&gt; &gt; You should configure monitoring ops for DRBD.<br>
&gt; &gt; One each for Master and Slave role, with different intervals.<br>
&gt;<br>
&gt; ** how i can do that ??<br>
&gt;<br>
&gt; from:<br>
&gt; <a href="http://www.drbd.org/users-guide-9.0/s-pacemaker-crm-drbd-backed-service.html" target="_blank">http://www.drbd.org/users-guide-9.0/s-pacemaker-crm-drbd-backed-service.html</a><br>
&gt;<br>
&gt; crm(live)configure# primitive drbd_mysql ocf:linbit:drbd \<br>
&gt;                     params drbd_resource=&quot;mysql&quot; \<br>
&gt;                     op monitor interval=&quot;29s&quot; role=&quot;Master&quot; \<br>
&gt;                     op monitor interval=&quot;31s&quot; role=&quot;Slave&quot;<br>
&gt;<br>
&gt;<br>
&gt; &gt; You probably need to &quot;crm resource cleanup ...&quot; a bit.<br>
&gt; ** remake the crm configure<br>
&gt;<br>
&gt; crm(live)# configure<br>
&gt; crm(live)configure# show<br>
&gt; node ha-master<br>
&gt; node ha-slave<br>
&gt; primitive drbd_postgresql ocf:heartbeat:drbd \<br>
&gt;         params drbd_resource=&quot;postgresql&quot; \<br>
&gt;         op monitor interval=&quot;29s&quot; role=&quot;Master&quot; \<br>
&gt;         op monitor interval=&quot;31s&quot; role=&quot;Slave&quot;<br>
&gt; primitive fs_postgresql ocf:heartbeat:Filesystem \<br>
&gt;         params device=&quot;/dev/drbd0&quot; directory=&quot;/mnt&quot; fstype=&quot;ext4&quot;<br>
&gt; primitive postgresqld lsb:postgresql<br>
&gt; primitive vip_cluster ocf:heartbeat:IPaddr2 \<br>
&gt;         params ip=&quot;<a href="tel:172.70.65.200" value="+551727065200">172.70.65.200</a>&quot; nic=&quot;eth0:1&quot;<br>
&gt; group postgresql fs_postgresql vip_cluster postgresqld<br>
&gt; ms ms_drbd_postgresql drbd_postgresql \<br>
&gt;         meta master-max=&quot;1&quot; master-node-max=&quot;1&quot; clone-max=&quot;2&quot;<br>
&gt; clone-node-max=&quot;1&quot; notify=&quot;true&quot;<br>
&gt; colocation postgresql_on_drbd inf: postgresql ms_drbd_postgresql:Master<br>
&gt; order postgresql_after_drbd inf: ms_drbd_postgresql:promote postgresql:start<br>
&gt; property $id=&quot;cib-bootstrap-options&quot; \<br>
&gt;         dc-version=&quot;1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c&quot; \<br>
&gt;         cluster-infrastructure=&quot;openais&quot; \<br>
&gt;         expected-quorum-votes=&quot;2&quot; \<br>
&gt;         no-quorum-policy=&quot;ignore&quot; \<br>
&gt;         stonith-enabled=&quot;false&quot;<br>
&gt; rsc_defaults $id=&quot;rsc-options&quot; \<br>
&gt;         resource-stickiness=&quot;100&quot;<br>
&gt;<br>
&gt;<br>
&gt; &gt; You may need to manually remove fencing constraints (if DRBD finished<br>
&gt; &gt; the resync when no pacemaker was running yet, it would not have been<br>
&gt; &gt; able to remove it from its handler).<br>
&gt;<br>
&gt; ** how i do that ???<br>
&gt; ** only in the ha-master ??<br>
&gt; # drbdadmin create-md postgresql<br>
&gt; # drbdadmin up postgresql<br>
&gt; # drbdadmin -- --overwrite-data-of-peer primary postgresql<br>
<br>
</div></div>Nope.<br>
cat /proc/drbd<br>
crm configure show<br>
<br>
   drbd UpToDate/UpToDate, but<br>
   fencing contraints present?<br>
   --&gt; delete those fencing constraints<br>
   crm configure<br>
       delete id-of-those-contraints...<br>
<div class="im"><br>
&gt; You may need to *read the logs*.<br>
&gt; The answer will be in there.<br>
&gt;<br>
&gt; &gt; You have:<br>
&gt; &gt; Failed actions:<br>
&gt; &gt;     drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1,<br>
&gt; &gt; status=complete): unknown error<br>
&gt;<br>
&gt; &gt; So start looking for that, and see what it complains about.<br>
&gt;<br>
&gt; # cat /var/log/syslog | grep drbd_postgresql<br>
&gt; *** Syslog<br>
&gt;<br>
&gt; Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op: drbd_postgresql:1_last_failure_0 on ha-slave returned 1 (unknown error) instead of the expected value: 0 (ok)<br>
&gt; Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: Processing failed op drbd_postgresql:1_last_failure_0 on ha-slave: unknown error (1)<br>
&gt; Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op: drbd_postgresql:0_last_failure_0 on ha-master returned 1 (unknown error) instead of the expected value: 0 (ok)<br>
&gt; Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: Processing failed op drbd_postgresql:0_last_failure_0 on ha-master: unknown error (1)<br>
&gt; Oct 14 11:10:08 ha-master pengine: [786]: info: clone_print:  Master/Slave Set: ms_drbd_postgresql [drbd_postgresql]<br>
&gt; Oct 14 11:10:08 ha-master pengine: [786]: info: short_print:      Stopped: [ drbd_postgresql:0 drbd_postgresql:1 ]<br>
&gt; Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount: ms_drbd_postgresql has failed INFINITY times on ha-slave<br>
&gt; Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness: Forcing ms_drbd_postgresql away from ha-slave after 1000000 failures (max=1000000)<br>
<br>
</div>Boring :-(<br>
These logs just say that &quot;there was a fatal error before,<br>
and that&#39;s why I won&#39;t even try again&quot;.<br>
And they will tell you this repeatedly, everytime<br>
the policy engine runs.<br>
<br>
You need to &quot;crm resource cleanup ...&quot;,<br>
then watch the actual start attempt fail.<br>
<br>
(Or look in the logs for the last real start attempt,<br>
 and its failure)<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
--<br>
: Lars Ellenberg<br>
: LINBIT | Your Way to High Availability<br>
: DRBD/HA support and consulting <a href="http://www.linbit.com" target="_blank">http://www.linbit.com</a><br>
<br>
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.<br>
__<br>
please don&#39;t Cc me, but send to list   --   I&#39;m subscribed<br>
_______________________________________________<br>
drbd-user mailing list<br>
<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>------------------------------<br>Thomaz Luiz Santos<br>Linux User: #359356<br><a href="http://thomaz.santos.googlepages.com/">http://thomaz.santos.googlepages.com/</a>
</div>