Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I can do start my cluster but. doing the failover tests, I reboot the master node and the slave node not assume the services in the log i have it: Oct 16 09:20:23 ha-slave drbd[7125]: ERROR: postgresql: Exit code 11 Oct 16 09:20:23 ha-slave drbd[7125]: ERROR: postgresql: Command output: Oct 16 09:20:24 ha-slave crmd: [1033]: ERROR: process_lrm_event: LRM operation drbd_postgresql:1_promote_0 (89) Timed Out (timeout=20000ms) Oct 16 09:20:29 ha-slave drbd[7585]: ERROR: postgresql: Called drbdadm -c /etc/drbd.conf primary postgresql I belive the slave node, not assume the primary on drbd On Mon, Oct 14, 2013 at 12:43 PM, Lars Ellenberg <lars.ellenberg at linbit.com>wrote: > On Mon, Oct 14, 2013 at 11:15:31AM -0300, Thomaz Luiz Santos wrote: > > > You should *not* start DRBD from the init script. > > > # chkconfig drbd off > > > > *** OK remove start on the boot > > > > > > > You should *NOT* configure "no-disk-drain". > > > It is likely to corrupt your data. > > ** OK removed the disk drain from postgresql.res > > > > # cat postgresql.res > > resource postgresql { > > startup { > > wfc-timeout 15; > > degr-wfc-timeout 60; > > } > > > > syncer { > > rate 150M; > > verify-alg md5; > > } > > > > on ha-master { > > device /dev/drbd0; > > disk /dev/sdb1; > > address 172.70.65.210:7788; > > meta-disk internal; > > } > > > > on ha-slave { > > device /dev/drbd0; > > disk /dev/sdb1; > > address 172.70.65.220:7788; > > meta-disk internal; > > } > > > > } > > > > > > > You should configure monitoring ops for DRBD. > > > One each for Master and Slave role, with different intervals. > > > > ** how i can do that ?? > > > > from: > > > http://www.drbd.org/users-guide-9.0/s-pacemaker-crm-drbd-backed-service.html > > > > crm(live)configure# primitive drbd_mysql ocf:linbit:drbd \ > > params drbd_resource="mysql" \ > > op monitor interval="29s" role="Master" \ > > op monitor interval="31s" role="Slave" > > > > > > > You probably need to "crm resource cleanup ..." a bit. > > ** remake the crm configure > > > > crm(live)# configure > > crm(live)configure# show > > node ha-master > > node ha-slave > > primitive drbd_postgresql ocf:heartbeat:drbd \ > > params drbd_resource="postgresql" \ > > op monitor interval="29s" role="Master" \ > > op monitor interval="31s" role="Slave" > > primitive fs_postgresql ocf:heartbeat:Filesystem \ > > params device="/dev/drbd0" directory="/mnt" fstype="ext4" > > primitive postgresqld lsb:postgresql > > primitive vip_cluster ocf:heartbeat:IPaddr2 \ > > params ip="172.70.65.200" nic="eth0:1" > > group postgresql fs_postgresql vip_cluster postgresqld > > ms ms_drbd_postgresql drbd_postgresql \ > > meta master-max="1" master-node-max="1" clone-max="2" > > clone-node-max="1" notify="true" > > colocation postgresql_on_drbd inf: postgresql ms_drbd_postgresql:Master > > order postgresql_after_drbd inf: ms_drbd_postgresql:promote > postgresql:start > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ > > cluster-infrastructure="openais" \ > > expected-quorum-votes="2" \ > > no-quorum-policy="ignore" \ > > stonith-enabled="false" > > rsc_defaults $id="rsc-options" \ > > resource-stickiness="100" > > > > > > > You may need to manually remove fencing constraints (if DRBD finished > > > the resync when no pacemaker was running yet, it would not have been > > > able to remove it from its handler). > > > > ** how i do that ??? > > ** only in the ha-master ?? > > # drbdadmin create-md postgresql > > # drbdadmin up postgresql > > # drbdadmin -- --overwrite-data-of-peer primary postgresql > > Nope. > cat /proc/drbd > crm configure show > > drbd UpToDate/UpToDate, but > fencing contraints present? > --> delete those fencing constraints > crm configure > delete id-of-those-contraints... > > > You may need to *read the logs*. > > The answer will be in there. > > > > > You have: > > > Failed actions: > > > drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1, > > > status=complete): unknown error > > > > > So start looking for that, and see what it complains about. > > > > # cat /var/log/syslog | grep drbd_postgresql > > *** Syslog > > > > Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op: > drbd_postgresql:1_last_failure_0 on ha-slave returned 1 (unknown error) > instead of the expected value: 0 (ok) > > Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: > Processing failed op drbd_postgresql:1_last_failure_0 on ha-slave: unknown > error (1) > > Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op: > drbd_postgresql:0_last_failure_0 on ha-master returned 1 (unknown error) > instead of the expected value: 0 (ok) > > Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: > Processing failed op drbd_postgresql:0_last_failure_0 on ha-master: unknown > error (1) > > Oct 14 11:10:08 ha-master pengine: [786]: info: clone_print: > Master/Slave Set: ms_drbd_postgresql [drbd_postgresql] > > Oct 14 11:10:08 ha-master pengine: [786]: info: short_print: > Stopped: [ drbd_postgresql:0 drbd_postgresql:1 ] > > Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount: > ms_drbd_postgresql has failed INFINITY times on ha-slave > > Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness: > Forcing ms_drbd_postgresql away from ha-slave after 1000000 failures > (max=1000000) > > Boring :-( > These logs just say that "there was a fatal error before, > and that's why I won't even try again". > And they will tell you this repeatedly, everytime > the policy engine runs. > > You need to "crm resource cleanup ...", > then watch the actual start attempt fail. > > (Or look in the logs for the last real start attempt, > and its failure) > > > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > __ > please don't Cc me, but send to list -- I'm subscribed > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -- ------------------------------ Thomaz Luiz Santos Linux User: #359356 http://thomaz.santos.googlepages.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20131017/ebafa2e6/attachment.htm>