[DRBD-user] DRBD Failover Not Working after Cold Shutdown of Primary

Fri Dec 14 20:16:10 CET 2007

Little more info please. (drbd.conf, ha.cf and connection path info)

If you are using "fencing resource-only;" try commenting out and
unplugging the power.  That would at least help you narrow down where
the problem is.  This one caused similar problems for me.

On Fri, 2007-12-14 at 10:52 -0800, Art Age Software wrote:
> Hi all,
> 
> I've been working on setting up a two-node HA cluster with Heartbeat + DRBD.
> Everything has been going well and now I am in the testing phase.
> All initial tests looked good. Manual failover via heartbeat commands
> works as expected.
> Rebooting the machines works as expected with resources transitioning correctly.
> Finally, it was time for the biggest test of all...physically pulling
> power on the pirmary.
> Now, this is a painful thing to have to do - but of course necessary.
> So, naturally I had hoped it would "just work" on the first try.
> Unfortunately, it did not. DRBD became primary on node2, but the file
> system never mounted.
> >From looking at the logs, DRBD on node2 seems to have aborted
> Heartbeat's attempt at takeover. I have copied a relvant portion of
> the log below.
> Can anybody offer any insight into what went wrong? Clearly, it isn't
> a highly available system if node2 won't take over in the event that
> node1 disappears.
> 
> DRBD version is 8.0.6.
> 
> Thanks,
> 
> Sam
> 
> Dec 14 18:26:30 node2 harc[5405]: [5418]: info: Running
> /etc/ha.d/rc.d/status status
> Dec 14 18:26:30 node2 mach_down[5431]: [5480]: info: Taking over
> resource group IPaddr::x.x.x.x/x
> Dec 14 18:26:30 node2 ResourceManager[5482]: [5509]: info: Acquiring
> resource group: node1 IPaddr::x.x.x.x/x IPaddr::x.x.x.x/x
> IPaddr::x.x.x.x/x drbddisk::mysql
> Filesystem::/dev/drbd0::/drbd::ext3::noatime
> Dec 14 18:26:30 node2 IPaddr[5468]: [5561]: INFO:  Running OK
> Dec 14 18:26:30 node2 heartbeat: [5406]: info: Local Resource
> acquisition completed.
> Dec 14 18:26:30 node2 IPaddr[5526]: [5579]: INFO:  Resource is stopped
> Dec 14 18:26:30 node2 ResourceManager[5482]: [6076]: info: Running
> /etc/ha.d/resource.d/drbddisk mysql start
> Dec 14 18:26:30 node2 kernel: drbd0: helper command: /sbin/drbdadm outdate-peer
> Dec 14 18:26:31 node2 /usr/lib64/heartbeat/dopd: [4780]: info:
> send_message_to_the_peer: sending start_outdate message to the other
> node node2 -> node1
> Dec 14 18:26:49 node2 ResourceManager[5482]: [6110]: ERROR: Return
> code 20 from /etc/ha.d/resource.d/drbddisk
> Dec 14 18:26:49 node2 ResourceManager[5482]: [6111]: CRIT: Giving up
> resources due to failure of drbddisk::mysql
> Dec 14 18:26:50 node2 ResourceManager[5482]: [6112]: info: Releasing
> resource group: node1 IPaddr::x.x.x.x/x IPaddr::x.x.x.x/x
> IPaddr::x.x.x.x/x drbddisk::mysql
> Filesystem::/dev/drbd0::/drbd::ext3::noatime
> Dec 14 18:26:50 node2 ResourceManager[5482]: [6127]: info: Running
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd ext3 noatime stop
> Dec 14 18:26:50 node2 Filesystem[6140]: [6169]: INFO: Running stop for
> /dev/drbd0 on /drbd
> Dec 14 18:26:50 node2 Filesystem[6129]: [6178]: INFO:  Success
> Dec 14 18:26:50 node2 ResourceManager[5482]: [6193]: info: Running
> /etc/ha.d/resource.d/drbddisk mysql stop
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user