[DRBD-user] crm-fence-peer.sh & maintenance / reboots

Sat Aug 4 13:18:47 CEST 2012

Hi Jake,

Looks like that it is that bug... I've patched & recompiled DRDB and did
some quick tests. Seems to work as it should now. Will do some more
extensive tests next week and will report back.

Thanks a lot, I've breaking my head over this one quite a while...

Cheers,

Dirk

Op 3-8-2012 16:09, Jake Smith schreef:
> ----- Original Message -----
>> From: "Dirk Bonenkamp - ProActive" <dirk at proactive.nl>
>> To: drbd-user at lists.linbit.com
>> Sent: Friday, August 3, 2012 4:17:46 AM
>> Subject: Re: [DRBD-user] crm-fence-peer.sh & maintenance / reboots
>>
>> Hi all,
>>
>> I'm still struggling with this problem. Since my last mail, I've
>> simplified my setup: 1 DRBD resource with only 1 file system
>> resource. I
>> normally have stonith in place & working, but this is also removed
>> for
>> simplicity.
>>
>> Things that work as expected:
>> - Pulling the dedicated drdb network cable. Location constraint is
>> created as expected (preventing promotion of the now unconnected
>> slave
>> node). The constraint gets removed after re-plugging the cable.
>> - Rebooting the slave node / putting the slave node in stanby mode.
>> No
>> constraints (as expected), no problems.
>> - Migrating the file system resource. File system unmounts, slave
>> node
>> becomes master, file system mounts, no problems.
>>
>> Things that do not work as expected:
>> - Rebooting the master node / putting the master node in standby
>> mode.
>> The location constraint is created, which prevents the slave becoming
>> master... To correct this, I have to put the old master node on-line
>> again and have to remove the constraint by hand.
>>
>> My setup:
>> Ubuntu 10.04 running 2.6.32-41-generic / x86_64
>> DRBD 8.3.13 (self compiled)
> Hi Dirk!
>
> Might be this bug affecting fencing I found when using the -41 kernel in Ubuntu with DRBD 8.3.13:
>
> https://bugs.launchpad.net/ubuntu/+source/drbd8/+bug/1000355
>
>> Pacemaker 1.1.6 (from HA maintainers PPA)
>> Corosync 1.4.2 (from HA maintainers PPA)
>>
>> Network:
>> 10.0.0.0/24 on eth0: network for 'normal' connectivity
>> 172.16.0.1 <-> 172.16.0.2 on eth1: dedicated network for DRBD
>>
>> corosync-cfgtool -s output:
>>
>> Printing ring status.
>> Local node ID 16781484
>> RING ID 0
>>         id      = 172.16.0.1
>>         status  = ring 0 active with no faults
>> RING ID 1
>>         id      = 10.0.0.71
>>         status  = ring 1 active with no faults
> Look here for a second step required to verify corosync rings are actually OK when it's only a two node cluster:
> http://www.hastexo.com/resources/hints-and-kinks/checking-corosync-cluster-membership
>
>> Configuration files:
>> http://pastebin.com/VUgHcuQ0
>>
>> Log of a failed failover (master node):
>> http://pastebin.com/f5amFMzY
>>
>> Log of a failed failover (slave node):
>> http://pastebin.com/QHBPnHFQ
> How about output of /proc/drbd and crm configure show when master node in standby?
>
> HTH
> Jake
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user