[DRBD-user] crm-fence-peer.sh & maintenance / reboots

Fri Aug 3 10:17:46 CEST 2012

Hi all,

I'm still struggling with this problem. Since my last mail, I've
simplified my setup: 1 DRBD resource with only 1 file system resource. I
normally have stonith in place & working, but this is also removed for
simplicity.

Things that work as expected:
- Pulling the dedicated drdb network cable. Location constraint is
created as expected (preventing promotion of the now unconnected slave
node). The constraint gets removed after re-plugging the cable.
- Rebooting the slave node / putting the slave node in stanby mode. No
constraints (as expected), no problems.
- Migrating the file system resource. File system unmounts, slave node
becomes master, file system mounts, no problems.

Things that do not work as expected:
- Rebooting the master node / putting the master node in standby mode.
The location constraint is created, which prevents the slave becoming
master... To correct this, I have to put the old master node on-line
again and have to remove the constraint by hand.

My setup:
Ubuntu 10.04 running 2.6.32-41-generic / x86_64
DRBD 8.3.13 (self compiled)
Pacemaker 1.1.6 (from HA maintainers PPA)
Corosync 1.4.2 (from HA maintainers PPA)

Network:
10.0.0.0/24 on eth0: network for 'normal' connectivity
172.16.0.1 <-> 172.16.0.2 on eth1: dedicated network for DRBD

corosync-cfgtool -s output:

Printing ring status.
Local node ID 16781484
RING ID 0
        id      = 172.16.0.1
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.0.0.71
        status  = ring 1 active with no faults

Configuration files:
http://pastebin.com/VUgHcuQ0

Log of a failed failover (master node):
http://pastebin.com/f5amFMzY

Log of a failed failover (slave node):
http://pastebin.com/QHBPnHFQ

I hope somebody can shed some light on this for me...

Thank you in advance, kind regards,

Dirk