[DRBD-user] pacemaker + corosync and postgresql

Mon Oct 14 16:15:31 CEST 2013

> You should *not* start DRBD from the init script.
>  # chkconfig drbd off

*** OK remove start on the boot

> You should *NOT* configure "no-disk-drain".
> It is likely to corrupt your data.
** OK removed the disk drain from postgresql.res

# cat postgresql.res
resource postgresql {
  startup {
    wfc-timeout 15;
    degr-wfc-timeout 60;
  }

  syncer {
    rate 150M;
    verify-alg md5;
  }

  on ha-master {
     device /dev/drbd0;
     disk /dev/sdb1;
     address 172.70.65.210:7788;
     meta-disk internal;
  }

  on ha-slave {
     device /dev/drbd0;
     disk /dev/sdb1;
     address 172.70.65.220:7788;
     meta-disk internal;
 }

}

> You should configure monitoring ops for DRBD.
> One each for Master and Slave role, with different intervals.

** how i can do that ??

from:
http://www.drbd.org/users-guide-9.0/s-pacemaker-crm-drbd-backed-service.html

crm(live)configure# primitive drbd_mysql ocf:linbit:drbd \
                    params drbd_resource="mysql" \
                    op monitor interval="29s" role="Master" \
                    op monitor interval="31s" role="Slave"

> You probably need to "crm resource cleanup ..." a bit.
** remake the crm configure

crm(live)# configure
crm(live)configure# show
node ha-master
node ha-slave
primitive drbd_postgresql ocf:heartbeat:drbd \
        params drbd_resource="postgresql" \
        op monitor interval="29s" role="Master" \
        op monitor interval="31s" role="Slave"
primitive fs_postgresql ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/mnt" fstype="ext4"
primitive postgresqld lsb:postgresql
primitive vip_cluster ocf:heartbeat:IPaddr2 \
        params ip="172.70.65.200" nic="eth0:1"
group postgresql fs_postgresql vip_cluster postgresqld
ms ms_drbd_postgresql drbd_postgresql \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
colocation postgresql_on_drbd inf: postgresql ms_drbd_postgresql:Master
order postgresql_after_drbd inf: ms_drbd_postgresql:promote postgresql:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="false"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"

> You may need to manually remove fencing constraints (if DRBD finished
> the resync when no pacemaker was running yet, it would not have been
> able to remove it from its handler).

** how i do that ???
** only in the ha-master ??
# drbdadmin create-md postgresql
# drbdadmin up postgresql
# drbdadmin -- --overwrite-data-of-peer primary postgresql

You may need to *read the logs*.
The answer will be in there.

> You have:
> Failed actions:
>     drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1,
> status=complete): unknown error

> So start looking for that, and see what it complains about.

# cat /var/log/syslog | grep drbd_postgresql
*** Syslog

Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op:
drbd_postgresql:1_last_failure_0 on ha-slave returned 1 (unknown error)
instead of the expected value: 0 (ok)
Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: Processing
failed op drbd_postgresql:1_last_failure_0 on ha-slave: unknown error (1)
Oct 14 11:10:08 ha-master pengine: [786]: debug: unpack_rsc_op:
drbd_postgresql:0_last_failure_0 on ha-master returned 1 (unknown error)
instead of the expected value: 0 (ok)
Oct 14 11:10:08 ha-master pengine: [786]: WARN: unpack_rsc_op: Processing
failed op drbd_postgresql:0_last_failure_0 on ha-master: unknown error (1)
Oct 14 11:10:08 ha-master pengine: [786]: info: clone_print:  Master/Slave
Set: ms_drbd_postgresql [drbd_postgresql]
Oct 14 11:10:08 ha-master pengine: [786]: info: short_print:      Stopped:
[ drbd_postgresql:0 drbd_postgresql:1 ]
Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount:
ms_drbd_postgresql has failed INFINITY times on ha-slave
Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness:
Forcing ms_drbd_postgresql away from ha-slave after 1000000 failures
(max=1000000)
Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount:
ms_drbd_postgresql has failed INFINITY times on ha-slave
Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness:
Forcing ms_drbd_postgresql away from ha-slave after 1000000 failures
(max=1000000)
Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount:
ms_drbd_postgresql has failed INFINITY times on ha-master
Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness:
Forcing ms_drbd_postgresql away from ha-master after 1000000 failures
(max=1000000)
Oct 14 11:10:08 ha-master pengine: [786]: info: get_failcount:
ms_drbd_postgresql has failed INFINITY times on ha-master
Oct 14 11:10:08 ha-master pengine: [786]: WARN: common_apply_stickiness:
Forcing ms_drbd_postgresql away from ha-master after 1000000 failures
(max=1000000)
Oct 14 11:10:08 ha-master pengine: [786]: info: rsc_merge_weights:
ms_drbd_postgresql: Rolling back scores from fs_postgresql
Oct 14 11:10:08 ha-master pengine: [786]: debug: native_assign_node: All
nodes for resource drbd_postgresql:0 are unavailable, unclean or shutting
down (ha-master: 1, -1000000)
Oct 14 11:10:08 ha-master pengine: [786]: debug: native_assign_node: Could
not allocate a node for drbd_postgresql:0
Oct 14 11:10:08 ha-master pengine: [786]: info: native_color: Resource
drbd_postgresql:0 cannot run anywhere
Oct 14 11:10:08 ha-master pengine: [786]: debug: native_assign_node: All
nodes for resource drbd_postgresql:1 are unavailable, unclean or shutting
down (ha-master: 1, -1000000)
Oct 14 11:10:08 ha-master pengine: [786]: debug: native_assign_node: Could
not allocate a node for drbd_postgresql:1
Oct 14 11:10:08 ha-master pengine: [786]: info: native_color: Resource
drbd_postgresql:1 cannot run anywhere
Oct 14 11:10:08 ha-master pengine: [786]: debug: clone_color: Allocated 0
ms_drbd_postgresql instances of a possible 2
Oct 14 11:10:08 ha-master pengine: [786]: info: rsc_merge_weights:
ms_drbd_postgresql: Rolling back scores from fs_postgresql
Oct 14 11:10:08 ha-master pengine: [786]: debug: master_color:
drbd_postgresql:0 master score: 0
Oct 14 11:10:08 ha-master pengine: [786]: debug: master_color:
drbd_postgresql:1 master score: 0
Oct 14 11:10:08 ha-master pengine: [786]: info: master_color:
ms_drbd_postgresql: Promoted 0 instances of a possible 1 to master
Oct 14 11:10:08 ha-master pengine: [786]: debug: master_create_actions:
Creating actions for ms_drbd_postgresql
Oct 14 11:10:08 ha-master pengine: [786]: notice: LogActions: Leave
drbd_postgresql:0 (Stopped)
Oct 14 11:10:08 ha-master pengine: [786]: notice: LogActions: Leave
drbd_postgresql:1 (Stopped)

On Fri, Oct 11, 2013 at 7:20 PM, Lars Ellenberg
<lars.ellenberg at linbit.com>wrote:

> On Fri, Oct 11, 2013 at 05:08:04PM -0300, Thomaz Luiz Santos wrote:
> > I'm trying to make a sample cluster, in virtual machine, and after
> migrate
> > to a physical machine, however i have problems to configure the
> pacemaker (
> > crm ),  to startup the resources and failover.
> >
> > I cant mount the device /dev/drbd0 in the primary node and start
> postgresql
> > manually, but use in crm resource,  dont can mount the device, and start
> de
> > postgresql.
>
> You should *not* start DRBD from the init script.
>  # chkconfig drbd off
>
> You should *NOT* configure "no-disk-drain".
> It is likely to corrupt your data.
>
> You should configure monitoring ops for DRBD.
> One each for Master and Slave role, with different intervals.
>
> You probably need to "crm resource cleanup ..." a bit.
>
> You may need to manually remove fencing constraints (if DRBD finished
> the resync when no pacemaker was running yet, it would not have been
> able to remove it from its handler).
>
> You may need to *read the logs*.
> The answer will be in there.
>
> You have:
> > Failed actions:
> >     drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1,
> > status=complete): unknown error
>
> So start looking for that, and see what it complains about.
>
> Cheers,
>         Lars
>
>
> > I reboot the virtual machines, and not have successful.
> > the DRBD not start the primary, and not mount the /dev/drbd0 and stard
> the
> > postgresql  :-(
> >
> >
> > DRBD Version: 8.3.11 (api:88)
> > Corosync Cluster Engine, version '1.4.2'
> > Pacemaker 1.1.6
> >
> >
> >
> > **** after reboot the virtual machine. *****
> >
> > ha-slave:
> >
> > version: 8.3.13 (api:88/proto:86-96)
> > srcversion: 697DE8B1973B1D8914F04DB
> >  0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
> >     ns:0 nr:28672 dw:28672 dr:0 al:0 bm:5 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n
> > oos:0
> >
> >
> > ha-master:
> > version: 8.3.13 (api:88/proto:86-96)
> > srcversion: 697DE8B1973B1D8914F04DB
> >  0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
> >     ns:28672 nr:0 dw:0 dr:28672 al:0 bm:5 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n
> > oos:0
> >
> >
> >
> >
> >
> > crm(live)# configure
> > crm(live)configure# show
> > node ha-master
> > node ha-slave
> > primitive drbd_postgresql ocf:heartbeat:drbd \
> >         params drbd_resource="postgresql"
> > primitive fs_postgresql ocf:heartbeat:Filesystem \
> >         params device="/dev/drbd/by-res/postgresql" directory="/mnt"
> > fstype="ext4"
> > primitive postgresqld lsb:postgresql
> > primitive vip_cluster ocf:heartbeat:IPaddr2 \
> >         params ip="172.70.65.200" nic="eth0:1"
> > group postgresql fs_postgresql vip_cluster postgresqld \
> >         meta target-role="Started"
> > ms ms_drbd_postgresql drbd_postgresql \
> >         meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true"
> > colocation postgresql_on_drbd inf: postgresql ms_drbd_postgresql:Master
> > order postgresql_after_drbd inf: ms_drbd_postgresql:promote
> postgresql:start
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> >         cluster-infrastructure="openais" \
> >         expected-quorum-votes="2" \
> >         stonith-enabled="false" \
> >         no-quorum-policy="ignore"
> > rsc_defaults $id="rsc-options" \
> >         resource-stickiness="100"
> >
> >
> >
> > crm(live)# resource
> > crm(live)resource# list
> >  Master/Slave Set: ms_drbd_postgresql [drbd_postgresql]
> >      Stopped: [ drbd_postgresql:0 drbd_postgresql:1 ]
> >  Resource Group: postgresql
> >      fs_postgresql      (ocf::heartbeat:Filesystem) Stopped
> >      vip_cluster        (ocf::heartbeat:IPaddr2) Stopped
> >      postgresqld        (lsb:postgresql) Stopped
> >
> >
> >
> >
> > ============
> > Last updated: Fri Oct 11 14:22:50 2013
> > Last change: Fri Oct 11 14:11:06 2013 via cibadmin on ha-slave
> > Stack: openais
> > Current DC: ha-slave - partition with quorum
> > Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> > 2 Nodes configured, 2 expected votes
> > 5 Resources configured.
> > ============
> >
> > Online: [ ha-slave ha-master ]
> >
> >
> > Failed actions:
> >     drbd_postgresql:0_start_0 (node=ha-slave, call=14, rc=1,
> > status=complete): unknown error
> >     drbd_postgresql:0_start_0 (node=ha-master, call=18, rc=1,
> > status=complete): unknown error
> >
> >
> >
> >
> > **** that is my global_common on drbd  ****
> >
> > global {
> >         usage-count yes;
> >         # minor-count dialog-refresh disable-ip-verification
> > }
> >
> > common {
> >         protocol C;
> >
> >         handlers {
> >                 pri-on-incon-degr
> > "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/not
> >
> >            ify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot
> > -f";
> >                 pri-lost-after-sb
> > "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/not
> >
> >            ify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot
> > -f";
> >                 local-io-error "/usr/lib/drbd/notify-io-error.sh;
> > /usr/lib/drbd/notify-emergenc
> >                                                        y-shutdown.sh;
> echo
> > o > /proc/sysrq-trigger ; halt -f";
> >                 fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> >                 after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> >                 # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
> >                 # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
> >                 # before-resync-target
> > "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c
> >
> >         16k";
> >                 # after-resync-target
> > /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
> >         }
> >
> >         startup {
> >                  # wfc-timeout 15;
> >                  # degr-wfc-timeout 60;
> >                  # outdated-wfc-timeout wait-after-sb
> >         }
> >
> >         disk {
> >                 # on-io-error fencing use-bmbv no-disk-barrier
> > no-disk-flushes
> >                 # no-disk-drain no-md-flushes max-bio-bvecs
> >         }
> >
> >         net {
> >                 # cram-hmac-alg sha1;
> >                 # shared-secret "secret";
> >                 # sndbuf-size rcvbuf-size timeout connect-int ping-int
> > ping-timeout max-buffers
> >                 # max-epoch-size ko-count allow-two-primaries
> cram-hmac-alg
> > shared-secret
> >                 # after-sb-0pri after-sb-1pri after-sb-2pri
> > data-integrity-alg no-tcp-cork
> >         }
> >
> >         syncer {
> >                 # rate 150M;
> >                 # rate after al-extents use-rle cpu-mask verify-alg
> > csums-alg
> >         }
> > }
> >
> >
> > **** that is my postgresql.res ****
> >
> > resource postgresql {
> >   startup {
> >     wfc-timeout 15;
> >     degr-wfc-timeout 60;
> >   }
> >
> >   syncer {
> >     rate 150M;
> >     verify-alg md5;
> >  }
> >
> >  disk {
> >    on-io-error detach;
> >    no-disk-barrier;
> >    no-disk-flushes;
> >    no-disk-drain;
> >    fencing resource-only;
> >  }
> >
> >  on ha-master {
> >     device /dev/drbd0;
> >     disk /dev/sdb1;
> >     address 172.70.65.210:7788;
> >     meta-disk internal;
> >  }
> >
> >  on ha-slave {
> >     device /dev/drbd0;
> >     disk /dev/sdb1;
> >     address 172.70.65.220:7788;
> >     meta-disk internal;
> >  }
> >
> >
> > }
> >
> >
> > **** that is my corosync.conf ****
> >
> >
> > compatibility: whitetank
> >
> > totem {
> >         version: 2
> >         secauth: off
> >         threads: 0
> >         interface {
> >                 ringnumber: 0
> >                 bindnetaddr: 172.70.65.200
> >                 mcastaddr: 226.94.1.1
> >                 mcastport: 5405
> >                 ttl: 1
> >         }
> > }
> >
> > logging {
> >         fileline: off
> >         to_stderr: yes
> >         to_logfile: yes
> >         to_syslog: yes
> >         logfile: /var/log/cluster/corosync.log
> >         debug: on
> >         timestamp: on
> >         logger_subsys {
> >                 subsys: AMF
> >                 debug: off
> >         }
> > }
> >
> > amf {
> >         mode: disabled
> > }
> >
> > aisexec{
> >   user : root
> >   group : root
> > }
> >
> > service{
> >   # Load the Pacemaker Cluster Resource Manager
> >   name : pacemaker
> >   ver : 0
> > }
> >
> >
> >
> > DRBD, postgresql, manually start :
> >
> >
> > version: 8.3.13 (api:88/proto:86-96)
> > srcversion: 697DE8B1973B1D8914F04DB
> >  0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
> >     ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n oos:0
> >
> >
> > version: 8.3.13 (api:88/proto:86-96)
> > srcversion: 697DE8B1973B1D8914F04DB
> >  0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
> >     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n oos:0
> >
> >
> >
> > root at ha-master:/mnt# df -hT
> > Sist. Arq.     Tipo      Tam. Usado Disp. Uso% Montado em
> > /dev/sda1      ext4      4,0G  1,8G  2,1G  47% /
> > udev           devtmpfs  473M  4,0K  473M   1% /dev
> > tmpfs          tmpfs     193M  264K  193M   1% /run
> > none           tmpfs     5,0M  4,0K  5,0M   1% /run/lock
> > none           tmpfs     482M   17M  466M   4% /run/shm
> > /dev/drbd0     ext4      2,0G   69M  1,9G   4% /mnt
> >
> >
> > root at ha-master:/mnt# service postgresql status
> > Running clusters: 9.1/main
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>

-- 
------------------------------
Thomaz Luiz Santos
Linux User: #359356
http://thomaz.santos.googlepages.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20131014/1d2162d2/attachment.htm>