Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> You should *not* start DRBD from the init script. > # chkconfig drbd off *** OK remove start on the boot > You should *NOT* configure "no-disk-drain". > It is likely to corrupt your data. ** OK removed the disk drain from postgresql.res # cat postgresql.res resource postgresql { startup { wfc-timeout 15; degr-wfc-timeout 60; } syncer { rate 150M; verify-alg md5; } on ha-master { device /dev/drbd0; disk /dev/sdb1; address 172.70.65.210:7788; meta-disk internal; } on ha-slave { device /dev/drbd0; disk /dev/sdb1; address 172.70.65.220:7788; meta-disk internal; } } > You should configure monitoring ops for DRBD. > One each for Master and Slave role, with different intervals. ** how i can do that ?? from: http://www.drbd.org/users-guide-9.0/s-pacemaker-crm-drbd-backed-service.html crm(live)configure# primitive drbd_mysql ocf:linbit:drbd \ params drbd_resource="mysql" \ op monitor interval="29s" role="Master" \ op monitor interval="31s" role="Slave" > You probably need to "crm resource cleanup ..." a bit. ** remake the crm configure crm(live)# configure crm(live)configure# show node ha-master node ha-slave primitive drbd_postgresql ocf:heartbeat:drbd \ params drbd_resource="postgresql" \ op monitor interval="29s" role="Master" \ op monitor interval="31s" role="Slave" primitive fs_postgresql ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/mnt" fstype="ext4" primitive postgresqld lsb:postgresql primitive vip_cluster ocf:heartbeat:IPaddr2 \ params ip="172.70.65.200" nic="eth0:1" group postgresql fs_postgresql vip_cluster postgresqld ms ms_drbd_postgresql drbd_postgresql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation postgresql_on_drbd inf: postgresql ms_drbd_postgresql:Master order postgresql_after_drbd inf: ms_drbd_postgresql:promote postgresql:start property $id="cib-bootstrap-options" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" rsc_defaults $id="rsc-options" \ resource-stickiness="100" > You may need to manually remove fencing constraints (if DRBD finished > the resync when no pacemaker was running yet, it would not have been > able to remove it from its handler). ** how i do that ??? ** only in the ha-master ?? # drbdadmin create-md postgresql # drbdadmin up postgresql # drbdadmin -- --overwrite-data-of-peer primary postgresql You may need to *read the logs*. The answer will be in there. > You have: > Failed actions: > drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1, > status=complete): unknown error > So start looking for that, and see what it complains about. # cat /var/log/syslog | grep drbd_postgresql *** Syslog Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op: drbd_postgresql:1_last_failure_0 on ha-slave returned 1 (unknown error) instead of the expected value: 0 (ok) Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: Processing failed op drbd_postgresql:1_last_failure_0 on ha-slave: unknown error (1) Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op: drbd_postgresql:0_last_failure_0 on ha-master returned 1 (unknown error) instead of the expected value: 0 (ok) Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: Processing failed op drbd_postgresql:0_last_failure_0 on ha-master: unknown error (1) Oct 14 11:10:08 ha-master pengine: [786]: info: clone_print: Master/Slave Set: ms_drbd_postgresql [drbd_postgresql] Oct 14 11:10:08 ha-master pengine: [786]: info: short_print: Stopped: [ drbd_postgresql:0 drbd_postgresql:1 ] Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount: ms_drbd_postgresql has failed INFINITY times on ha-slave Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness: Forcing ms_drbd_postgresql away from ha-slave after 1000000 failures (max=1000000) Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount: ms_drbd_postgresql has failed INFINITY times on ha-slave Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness: Forcing ms_drbd_postgresql away from ha-slave after 1000000 failures (max=1000000) Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount: ms_drbd_postgresql has failed INFINITY times on ha-master Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness: Forcing ms_drbd_postgresql away from ha-master after 1000000 failures (max=1000000) Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount: ms_drbd_postgresql has failed INFINITY times on ha-master Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness: Forcing ms_drbd_postgresql away from ha-master after 1000000 failures (max=1000000) Oct 14 11:10:08 ha-master pengine: [786]: info: rsc_merge_weights: ms_drbd_postgresql: Rolling back scores from fs_postgresql Oct 14 11:10:08 ha-master pengine: [786]: debug: native_assign_node: All nodes for resource drbd_postgresql:0 are unavailable, unclean or shutting down (ha-master: 1, -1000000) Oct 14 11:10:08 ha-master pengine: [786]: debug: native_assign_node: Could not allocate a node for drbd_postgresql:0 Oct 14 11:10:08 ha-master pengine: [786]: info: native_color: Resource drbd_postgresql:0 cannot run anywhere Oct 14 11:10:08 ha-master pengine: [786]: debug: native_assign_node: All nodes for resource drbd_postgresql:1 are unavailable, unclean or shutting down (ha-master: 1, -1000000) Oct 14 11:10:08 ha-master pengine: [786]: debug: native_assign_node: Could not allocate a node for drbd_postgresql:1 Oct 14 11:10:08 ha-master pengine: [786]: info: native_color: Resource drbd_postgresql:1 cannot run anywhere Oct 14 11:10:08 ha-master pengine: [786]: debug: clone_color: Allocated 0 ms_drbd_postgresql instances of a possible 2 Oct 14 11:10:08 ha-master pengine: [786]: info: rsc_merge_weights: ms_drbd_postgresql: Rolling back scores from fs_postgresql Oct 14 11:10:08 ha-master pengine: [786]: debug: master_color: drbd_postgresql:0 master score: 0 Oct 14 11:10:08 ha-master pengine: [786]: debug: master_color: drbd_postgresql:1 master score: 0 Oct 14 11:10:08 ha-master pengine: [786]: info: master_color: ms_drbd_postgresql: Promoted 0 instances of a possible 1 to master Oct 14 11:10:08 ha-master pengine: [786]: debug: master_create_actions: Creating actions for ms_drbd_postgresql Oct 14 11:10:08 ha-master pengine: [786]: notice: LogActions: Leave drbd_postgresql:0 (Stopped) Oct 14 11:10:08 ha-master pengine: [786]: notice: LogActions: Leave drbd_postgresql:1 (Stopped) On Fri, Oct 11, 2013 at 7:20 PM, Lars Ellenberg <lars.ellenberg at linbit.com>wrote: > On Fri, Oct 11, 2013 at 05:08:04PM -0300, Thomaz Luiz Santos wrote: > > I'm trying to make a sample cluster, in virtual machine, and after > migrate > > to a physical machine, however i have problems to configure the > pacemaker ( > > crm ), to startup the resources and failover. > > > > I cant mount the device /dev/drbd0 in the primary node and start > postgresql > > manually, but use in crm resource, dont can mount the device, and start > de > > postgresql. > > You should *not* start DRBD from the init script. > # chkconfig drbd off > > You should *NOT* configure "no-disk-drain". > It is likely to corrupt your data. > > You should configure monitoring ops for DRBD. > One each for Master and Slave role, with different intervals. > > You probably need to "crm resource cleanup ..." a bit. > > You may need to manually remove fencing constraints (if DRBD finished > the resync when no pacemaker was running yet, it would not have been > able to remove it from its handler). > > You may need to *read the logs*. > The answer will be in there. > > You have: > > Failed actions: > > drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1, > > status=complete): unknown error > > So start looking for that, and see what it complains about. > > Cheers, > Lars > > > > I reboot the virtual machines, and not have successful. > > the DRBD not start the primary, and not mount the /dev/drbd0 and stard > the > > postgresql :-( > > > > > > DRBD Version: 8.3.11 (api:88) > > Corosync Cluster Engine, version '1.4.2' > > Pacemaker 1.1.6 > > > > > > > > **** after reboot the virtual machine. ***** > > > > ha-slave: > > > > version: 8.3.13 (api:88/proto:86-96) > > srcversion: 697DE8B1973B1D8914F04DB > > 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r----- > > ns:0 nr:28672 dw:28672 dr:0 al:0 bm:5 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n > > oos:0 > > > > > > ha-master: > > version: 8.3.13 (api:88/proto:86-96) > > srcversion: 697DE8B1973B1D8914F04DB > > 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r----- > > ns:28672 nr:0 dw:0 dr:28672 al:0 bm:5 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n > > oos:0 > > > > > > > > > > > > crm(live)# configure > > crm(live)configure# show > > node ha-master > > node ha-slave > > primitive drbd_postgresql ocf:heartbeat:drbd \ > > params drbd_resource="postgresql" > > primitive fs_postgresql ocf:heartbeat:Filesystem \ > > params device="/dev/drbd/by-res/postgresql" directory="/mnt" > > fstype="ext4" > > primitive postgresqld lsb:postgresql > > primitive vip_cluster ocf:heartbeat:IPaddr2 \ > > params ip="172.70.65.200" nic="eth0:1" > > group postgresql fs_postgresql vip_cluster postgresqld \ > > meta target-role="Started" > > ms ms_drbd_postgresql drbd_postgresql \ > > meta master-max="1" master-node-max="1" clone-max="2" > > clone-node-max="1" notify="true" > > colocation postgresql_on_drbd inf: postgresql ms_drbd_postgresql:Master > > order postgresql_after_drbd inf: ms_drbd_postgresql:promote > postgresql:start > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ > > cluster-infrastructure="openais" \ > > expected-quorum-votes="2" \ > > stonith-enabled="false" \ > > no-quorum-policy="ignore" > > rsc_defaults $id="rsc-options" \ > > resource-stickiness="100" > > > > > > > > crm(live)# resource > > crm(live)resource# list > > Master/Slave Set: ms_drbd_postgresql [drbd_postgresql] > > Stopped: [ drbd_postgresql:0 drbd_postgresql:1 ] > > Resource Group: postgresql > > fs_postgresql (ocf::heartbeat:Filesystem) Stopped > > vip_cluster (ocf::heartbeat:IPaddr2) Stopped > > postgresqld (lsb:postgresql) Stopped > > > > > > > > > > ============ > > Last updated: Fri Oct 11 14:22:50 2013 > > Last change: Fri Oct 11 14:11:06 2013 via cibadmin on ha-slave > > Stack: openais > > Current DC: ha-slave - partition with quorum > > Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c > > 2 Nodes configured, 2 expected votes > > 5 Resources configured. > > ============ > > > > Online: [ ha-slave ha-master ] > > > > > > Failed actions: > > drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1, > > status=complete): unknown error > > drbd_postgresql:0_start_0 (node=ha-master, call=18, rc=1, > > status=complete): unknown error > > > > > > > > > > **** that is my global_common on drbd **** > > > > global { > > usage-count yes; > > # minor-count dialog-refresh disable-ip-verification > > } > > > > common { > > protocol C; > > > > handlers { > > pri-on-incon-degr > > "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/not > > > > ify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot > > -f"; > > pri-lost-after-sb > > "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/not > > > > ify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot > > -f"; > > local-io-error "/usr/lib/drbd/notify-io-error.sh; > > /usr/lib/drbd/notify-emergenc > > y-shutdown.sh; > echo > > o > /proc/sysrq-trigger ; halt -f"; > > fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; > > after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; > > # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; > > # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; > > # before-resync-target > > "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c > > > > 16k"; > > # after-resync-target > > /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; > > } > > > > startup { > > # wfc-timeout 15; > > # degr-wfc-timeout 60; > > # outdated-wfc-timeout wait-after-sb > > } > > > > disk { > > # on-io-error fencing use-bmbv no-disk-barrier > > no-disk-flushes > > # no-disk-drain no-md-flushes max-bio-bvecs > > } > > > > net { > > # cram-hmac-alg sha1; > > # shared-secret "secret"; > > # sndbuf-size rcvbuf-size timeout connect-int ping-int > > ping-timeout max-buffers > > # max-epoch-size ko-count allow-two-primaries > cram-hmac-alg > > shared-secret > > # after-sb-0pri after-sb-1pri after-sb-2pri > > data-integrity-alg no-tcp-cork > > } > > > > syncer { > > # rate 150M; > > # rate after al-extents use-rle cpu-mask verify-alg > > csums-alg > > } > > } > > > > > > **** that is my postgresql.res **** > > > > resource postgresql { > > startup { > > wfc-timeout 15; > > degr-wfc-timeout 60; > > } > > > > syncer { > > rate 150M; > > verify-alg md5; > > } > > > > disk { > > on-io-error detach; > > no-disk-barrier; > > no-disk-flushes; > > no-disk-drain; > > fencing resource-only; > > } > > > > on ha-master { > > device /dev/drbd0; > > disk /dev/sdb1; > > address 172.70.65.210:7788; > > meta-disk internal; > > } > > > > on ha-slave { > > device /dev/drbd0; > > disk /dev/sdb1; > > address 172.70.65.220:7788; > > meta-disk internal; > > } > > > > > > } > > > > > > **** that is my corosync.conf **** > > > > > > compatibility: whitetank > > > > totem { > > version: 2 > > secauth: off > > threads: 0 > > interface { > > ringnumber: 0 > > bindnetaddr: 172.70.65.200 > > mcastaddr: 226.94.1.1 > > mcastport: 5405 > > ttl: 1 > > } > > } > > > > logging { > > fileline: off > > to_stderr: yes > > to_logfile: yes > > to_syslog: yes > > logfile: /var/log/cluster/corosync.log > > debug: on > > timestamp: on > > logger_subsys { > > subsys: AMF > > debug: off > > } > > } > > > > amf { > > mode: disabled > > } > > > > aisexec{ > > user : root > > group : root > > } > > > > service{ > > # Load the Pacemaker Cluster Resource Manager > > name : pacemaker > > ver : 0 > > } > > > > > > > > DRBD, postgresql, manually start : > > > > > > version: 8.3.13 (api:88/proto:86-96) > > srcversion: 697DE8B1973B1D8914F04DB > > 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- > > ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n oos:0 > > > > > > version: 8.3.13 (api:88/proto:86-96) > > srcversion: 697DE8B1973B1D8914F04DB > > 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- > > ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n oos:0 > > > > > > > > root at ha-master:/mnt# df -hT > > Sist. Arq. Tipo Tam. Usado Disp. Uso% Montado em > > /dev/sda1 ext4 4,0G 1,8G 2,1G 47% / > > udev devtmpfs 473M 4,0K 473M 1% /dev > > tmpfs tmpfs 193M 264K 193M 1% /run > > none tmpfs 5,0M 4,0K 5,0M 1% /run/lock > > none tmpfs 482M 17M 466M 4% /run/shm > > /dev/drbd0 ext4 2,0G 69M 1,9G 4% /mnt > > > > > > root at ha-master:/mnt# service postgresql status > > Running clusters: 9.1/main > > > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > __ > please don't Cc me, but send to list -- I'm subscribed > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -- ------------------------------ Thomaz Luiz Santos Linux User: #359356 http://thomaz.santos.googlepages.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20131014/1d2162d2/attachment.htm>