[DRBD-user] pacemaker + corosync and postgresql

Thomaz Luiz Santos thomaz.santos at gmail.com
Thu Oct 17 15:54:58 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I can do start my cluster but.

doing the failover tests, I reboot the master node and the slave node not
assume the services



in the log i have it:
Oct 16 09:20:23 ha-slave drbd[7125]: ERROR: postgresql: Exit code 11
Oct 16 09:20:23 ha-slave drbd[7125]: ERROR: postgresql: Command output:
Oct 16 09:20:24 ha-slave crmd: [1033]: ERROR: process_lrm_event: LRM
operation drbd_postgresql:1_promote_0 (89) Timed Out (timeout=20000ms)
Oct 16 09:20:29 ha-slave drbd[7585]: ERROR: postgresql: Called drbdadm -c
/etc/drbd.conf primary postgresql

I belive the slave node, not assume the primary on drbd


On Mon, Oct 14, 2013 at 12:43 PM, Lars Ellenberg
<lars.ellenberg at linbit.com>wrote:

> On Mon, Oct 14, 2013 at 11:15:31AM -0300, Thomaz Luiz Santos wrote:
> > > You should *not* start DRBD from the init script.
> > >  # chkconfig drbd off
> >
> > *** OK remove start on the boot
> >
> >
> > > You should *NOT* configure "no-disk-drain".
> > > It is likely to corrupt your data.
> > ** OK removed the disk drain from postgresql.res
> >
> > # cat postgresql.res
> > resource postgresql {
> >   startup {
> >     wfc-timeout 15;
> >     degr-wfc-timeout 60;
> >   }
> >
> >   syncer {
> >     rate 150M;
> >     verify-alg md5;
> >   }
> >
> >   on ha-master {
> >      device /dev/drbd0;
> >      disk /dev/sdb1;
> >      address 172.70.65.210:7788;
> >      meta-disk internal;
> >   }
> >
> >   on ha-slave {
> >      device /dev/drbd0;
> >      disk /dev/sdb1;
> >      address 172.70.65.220:7788;
> >      meta-disk internal;
> >  }
> >
> > }
> >
> >
> > > You should configure monitoring ops for DRBD.
> > > One each for Master and Slave role, with different intervals.
> >
> > ** how i can do that ??
> >
> > from:
> >
> http://www.drbd.org/users-guide-9.0/s-pacemaker-crm-drbd-backed-service.html
> >
> > crm(live)configure# primitive drbd_mysql ocf:linbit:drbd \
> >                     params drbd_resource="mysql" \
> >                     op monitor interval="29s" role="Master" \
> >                     op monitor interval="31s" role="Slave"
> >
> >
> > > You probably need to "crm resource cleanup ..." a bit.
> > ** remake the crm configure
> >
> > crm(live)# configure
> > crm(live)configure# show
> > node ha-master
> > node ha-slave
> > primitive drbd_postgresql ocf:heartbeat:drbd \
> >         params drbd_resource="postgresql" \
> >         op monitor interval="29s" role="Master" \
> >         op monitor interval="31s" role="Slave"
> > primitive fs_postgresql ocf:heartbeat:Filesystem \
> >         params device="/dev/drbd0" directory="/mnt" fstype="ext4"
> > primitive postgresqld lsb:postgresql
> > primitive vip_cluster ocf:heartbeat:IPaddr2 \
> >         params ip="172.70.65.200" nic="eth0:1"
> > group postgresql fs_postgresql vip_cluster postgresqld
> > ms ms_drbd_postgresql drbd_postgresql \
> >         meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true"
> > colocation postgresql_on_drbd inf: postgresql ms_drbd_postgresql:Master
> > order postgresql_after_drbd inf: ms_drbd_postgresql:promote
> postgresql:start
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> >         cluster-infrastructure="openais" \
> >         expected-quorum-votes="2" \
> >         no-quorum-policy="ignore" \
> >         stonith-enabled="false"
> > rsc_defaults $id="rsc-options" \
> >         resource-stickiness="100"
> >
> >
> > > You may need to manually remove fencing constraints (if DRBD finished
> > > the resync when no pacemaker was running yet, it would not have been
> > > able to remove it from its handler).
> >
> > ** how i do that ???
> > ** only in the ha-master ??
> > # drbdadmin create-md postgresql
> > # drbdadmin up postgresql
> > # drbdadmin -- --overwrite-data-of-peer primary postgresql
>
> Nope.
> cat /proc/drbd
> crm configure show
>
>    drbd UpToDate/UpToDate, but
>    fencing contraints present?
>    --> delete those fencing constraints
>    crm configure
>        delete id-of-those-contraints...
>
> > You may need to *read the logs*.
> > The answer will be in there.
> >
> > > You have:
> > > Failed actions:
> > >     drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1,
> > > status=complete): unknown error
> >
> > > So start looking for that, and see what it complains about.
> >
> > # cat /var/log/syslog | grep drbd_postgresql
> > *** Syslog
> >
> > Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op:
> drbd_postgresql:1_last_failure_0 on ha-slave returned 1 (unknown error)
> instead of the expected value: 0 (ok)
> > Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op:
> Processing failed op drbd_postgresql:1_last_failure_0 on ha-slave: unknown
> error (1)
> > Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op:
> drbd_postgresql:0_last_failure_0 on ha-master returned 1 (unknown error)
> instead of the expected value: 0 (ok)
> > Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op:
> Processing failed op drbd_postgresql:0_last_failure_0 on ha-master: unknown
> error (1)
> > Oct 14 11:10:08 ha-master pengine: [786]: info: clone_print:
>  Master/Slave Set: ms_drbd_postgresql [drbd_postgresql]
> > Oct 14 11:10:08 ha-master pengine: [786]: info: short_print:
>  Stopped: [ drbd_postgresql:0 drbd_postgresql:1 ]
> > Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount:
> ms_drbd_postgresql has failed INFINITY times on ha-slave
> > Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness:
> Forcing ms_drbd_postgresql away from ha-slave after 1000000 failures
> (max=1000000)
>
> Boring :-(
> These logs just say that "there was a fatal error before,
> and that's why I won't even try again".
> And they will tell you this repeatedly, everytime
> the policy engine runs.
>
> You need to "crm resource cleanup ...",
> then watch the actual start attempt fail.
>
> (Or look in the logs for the last real start attempt,
>  and its failure)
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>



-- 
------------------------------
Thomaz Luiz Santos
Linux User: #359356
http://thomaz.santos.googlepages.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20131017/ebafa2e6/attachment.htm>


More information about the drbd-user mailing list