<div dir="ltr">I can do start my cluster but.<div><br></div><div><div style="font-family:arial,sans-serif;font-size:13px">doing the failover tests, I reboot the master node and the slave node not assume the services </div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">
in the log i have it:</div><div style="font-family:arial,sans-serif;font-size:13px">Oct 16 09:20:23 ha-slave <span class="">drbd</span>[7125]: ERROR: postgresql: Exit code 11<br></div><div style="font-family:arial,sans-serif;font-size:13px">
<div>Oct 16 09:20:23 ha-slave <span class="">drbd</span>[7125]: ERROR: postgresql: Command output:</div><div>Oct 16 09:20:24 ha-slave crmd: [1033]: ERROR: process_lrm_event: LRM operation drbd_postgresql:1_promote_0 (89) Timed Out (timeout=20000ms)</div>
<div>Oct 16 09:20:29 ha-slave <span class="">drbd</span>[7585]: ERROR: postgresql: Called drbdadm -c /etc/<span class="">drbd</span>.conf primary postgresql</div></div><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">I belive the slave node, not assume the primary on <span class="">drbd</span> </div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Mon, Oct 14, 2013 at 12:43 PM, Lars Ellenberg <span dir="ltr"><<a href="mailto:lars.ellenberg@linbit.com" target="_blank">lars.ellenberg@linbit.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">On Mon, Oct 14, 2013 at 11:15:31AM -0300, Thomaz Luiz Santos wrote:<br>
> > You should *not* start DRBD from the init script.<br>
> > # chkconfig drbd off<br>
><br>
> *** OK remove start on the boot<br>
><br>
><br>
> > You should *NOT* configure "no-disk-drain".<br>
> > It is likely to corrupt your data.<br>
> ** OK removed the disk drain from postgresql.res<br>
><br>
> # cat postgresql.res<br>
> resource postgresql {<br>
> startup {<br>
> wfc-timeout 15;<br>
> degr-wfc-timeout 60;<br>
> }<br>
><br>
> syncer {<br>
> rate 150M;<br>
> verify-alg md5;<br>
> }<br>
><br>
> on ha-master {<br>
> device /dev/drbd0;<br>
> disk /dev/sdb1;<br>
> address <a href="http://172.70.65.210:7788" target="_blank">172.70.65.210:7788</a>;<br>
> meta-disk internal;<br>
> }<br>
><br>
> on ha-slave {<br>
> device /dev/drbd0;<br>
> disk /dev/sdb1;<br>
> address <a href="http://172.70.65.220:7788" target="_blank">172.70.65.220:7788</a>;<br>
> meta-disk internal;<br>
> }<br>
><br>
> }<br>
><br>
><br>
> > You should configure monitoring ops for DRBD.<br>
> > One each for Master and Slave role, with different intervals.<br>
><br>
> ** how i can do that ??<br>
><br>
> from:<br>
> <a href="http://www.drbd.org/users-guide-9.0/s-pacemaker-crm-drbd-backed-service.html" target="_blank">http://www.drbd.org/users-guide-9.0/s-pacemaker-crm-drbd-backed-service.html</a><br>
><br>
> crm(live)configure# primitive drbd_mysql ocf:linbit:drbd \<br>
> params drbd_resource="mysql" \<br>
> op monitor interval="29s" role="Master" \<br>
> op monitor interval="31s" role="Slave"<br>
><br>
><br>
> > You probably need to "crm resource cleanup ..." a bit.<br>
> ** remake the crm configure<br>
><br>
> crm(live)# configure<br>
> crm(live)configure# show<br>
> node ha-master<br>
> node ha-slave<br>
> primitive drbd_postgresql ocf:heartbeat:drbd \<br>
> params drbd_resource="postgresql" \<br>
> op monitor interval="29s" role="Master" \<br>
> op monitor interval="31s" role="Slave"<br>
> primitive fs_postgresql ocf:heartbeat:Filesystem \<br>
> params device="/dev/drbd0" directory="/mnt" fstype="ext4"<br>
> primitive postgresqld lsb:postgresql<br>
> primitive vip_cluster ocf:heartbeat:IPaddr2 \<br>
> params ip="<a href="tel:172.70.65.200" value="+551727065200">172.70.65.200</a>" nic="eth0:1"<br>
> group postgresql fs_postgresql vip_cluster postgresqld<br>
> ms ms_drbd_postgresql drbd_postgresql \<br>
> meta master-max="1" master-node-max="1" clone-max="2"<br>
> clone-node-max="1" notify="true"<br>
> colocation postgresql_on_drbd inf: postgresql ms_drbd_postgresql:Master<br>
> order postgresql_after_drbd inf: ms_drbd_postgresql:promote postgresql:start<br>
> property $id="cib-bootstrap-options" \<br>
> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \<br>
> cluster-infrastructure="openais" \<br>
> expected-quorum-votes="2" \<br>
> no-quorum-policy="ignore" \<br>
> stonith-enabled="false"<br>
> rsc_defaults $id="rsc-options" \<br>
> resource-stickiness="100"<br>
><br>
><br>
> > You may need to manually remove fencing constraints (if DRBD finished<br>
> > the resync when no pacemaker was running yet, it would not have been<br>
> > able to remove it from its handler).<br>
><br>
> ** how i do that ???<br>
> ** only in the ha-master ??<br>
> # drbdadmin create-md postgresql<br>
> # drbdadmin up postgresql<br>
> # drbdadmin -- --overwrite-data-of-peer primary postgresql<br>
<br>
</div></div>Nope.<br>
cat /proc/drbd<br>
crm configure show<br>
<br>
drbd UpToDate/UpToDate, but<br>
fencing contraints present?<br>
--> delete those fencing constraints<br>
crm configure<br>
delete id-of-those-contraints...<br>
<div class="im"><br>
> You may need to *read the logs*.<br>
> The answer will be in there.<br>
><br>
> > You have:<br>
> > Failed actions:<br>
> > drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1,<br>
> > status=complete): unknown error<br>
><br>
> > So start looking for that, and see what it complains about.<br>
><br>
> # cat /var/log/syslog | grep drbd_postgresql<br>
> *** Syslog<br>
><br>
> Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op: drbd_postgresql:1_last_failure_0 on ha-slave returned 1 (unknown error) instead of the expected value: 0 (ok)<br>
> Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: Processing failed op drbd_postgresql:1_last_failure_0 on ha-slave: unknown error (1)<br>
> Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op: drbd_postgresql:0_last_failure_0 on ha-master returned 1 (unknown error) instead of the expected value: 0 (ok)<br>
> Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: Processing failed op drbd_postgresql:0_last_failure_0 on ha-master: unknown error (1)<br>
> Oct 14 11:10:08 ha-master pengine: [786]: info: clone_print: Master/Slave Set: ms_drbd_postgresql [drbd_postgresql]<br>
> Oct 14 11:10:08 ha-master pengine: [786]: info: short_print: Stopped: [ drbd_postgresql:0 drbd_postgresql:1 ]<br>
> Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount: ms_drbd_postgresql has failed INFINITY times on ha-slave<br>
> Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness: Forcing ms_drbd_postgresql away from ha-slave after 1000000 failures (max=1000000)<br>
<br>
</div>Boring :-(<br>
These logs just say that "there was a fatal error before,<br>
and that's why I won't even try again".<br>
And they will tell you this repeatedly, everytime<br>
the policy engine runs.<br>
<br>
You need to "crm resource cleanup ...",<br>
then watch the actual start attempt fail.<br>
<br>
(Or look in the logs for the last real start attempt,<br>
and its failure)<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
--<br>
: Lars Ellenberg<br>
: LINBIT | Your Way to High Availability<br>
: DRBD/HA support and consulting <a href="http://www.linbit.com" target="_blank">http://www.linbit.com</a><br>
<br>
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.<br>
__<br>
please don't Cc me, but send to list -- I'm subscribed<br>
_______________________________________________<br>
drbd-user mailing list<br>
<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>------------------------------<br>Thomaz Luiz Santos<br>Linux User: #359356<br><a href="http://thomaz.santos.googlepages.com/">http://thomaz.santos.googlepages.com/</a>
</div>