Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
----- Original Message ----- > From: "Dirk Bonenkamp - ProActive" <dirk at proactive.nl> > To: drbd-user at lists.linbit.com > Sent: Friday, August 3, 2012 4:17:46 AM > Subject: Re: [DRBD-user] crm-fence-peer.sh & maintenance / reboots > > Hi all, > > I'm still struggling with this problem. Since my last mail, I've > simplified my setup: 1 DRBD resource with only 1 file system > resource. I > normally have stonith in place & working, but this is also removed > for > simplicity. > > Things that work as expected: > - Pulling the dedicated drdb network cable. Location constraint is > created as expected (preventing promotion of the now unconnected > slave > node). The constraint gets removed after re-plugging the cable. > - Rebooting the slave node / putting the slave node in stanby mode. > No > constraints (as expected), no problems. > - Migrating the file system resource. File system unmounts, slave > node > becomes master, file system mounts, no problems. > > Things that do not work as expected: > - Rebooting the master node / putting the master node in standby > mode. > The location constraint is created, which prevents the slave becoming > master... To correct this, I have to put the old master node on-line > again and have to remove the constraint by hand. > > My setup: > Ubuntu 10.04 running 2.6.32-41-generic / x86_64 > DRBD 8.3.13 (self compiled) Hi Dirk! Might be this bug affecting fencing I found when using the -41 kernel in Ubuntu with DRBD 8.3.13: https://bugs.launchpad.net/ubuntu/+source/drbd8/+bug/1000355 > Pacemaker 1.1.6 (from HA maintainers PPA) > Corosync 1.4.2 (from HA maintainers PPA) > > Network: > 10.0.0.0/24 on eth0: network for 'normal' connectivity > 172.16.0.1 <-> 172.16.0.2 on eth1: dedicated network for DRBD > > corosync-cfgtool -s output: > > Printing ring status. > Local node ID 16781484 > RING ID 0 > id = 172.16.0.1 > status = ring 0 active with no faults > RING ID 1 > id = 10.0.0.71 > status = ring 1 active with no faults Look here for a second step required to verify corosync rings are actually OK when it's only a two node cluster: http://www.hastexo.com/resources/hints-and-kinks/checking-corosync-cluster-membership > > Configuration files: > http://pastebin.com/VUgHcuQ0 > > Log of a failed failover (master node): > http://pastebin.com/f5amFMzY > > Log of a failed failover (slave node): > http://pastebin.com/QHBPnHFQ How about output of /proc/drbd and crm configure show when master node in standby? HTH Jake