[DRBD-user] DRBD Failover Not Working after Cold Shutdown of Primary

Fri Dec 14 19:52:17 CET 2007

Hi all,

I've been working on setting up a two-node HA cluster with Heartbeat + DRBD.
Everything has been going well and now I am in the testing phase.
All initial tests looked good. Manual failover via heartbeat commands
works as expected.
Rebooting the machines works as expected with resources transitioning correctly.
Finally, it was time for the biggest test of all...physically pulling
power on the pirmary.
Now, this is a painful thing to have to do - but of course necessary.
So, naturally I had hoped it would "just work" on the first try.
Unfortunately, it did not. DRBD became primary on node2, but the file
system never mounted.
>From looking at the logs, DRBD on node2 seems to have aborted
Heartbeat's attempt at takeover. I have copied a relvant portion of
the log below.
Can anybody offer any insight into what went wrong? Clearly, it isn't
a highly available system if node2 won't take over in the event that
node1 disappears.

DRBD version is 8.0.6.

Thanks,

Sam

Dec 14 18:26:30 node2 harc[5405]: [5418]: info: Running
/etc/ha.d/rc.d/status status
Dec 14 18:26:30 node2 mach_down[5431]: [5480]: info: Taking over
resource group IPaddr::x.x.x.x/x
Dec 14 18:26:30 node2 ResourceManager[5482]: [5509]: info: Acquiring
resource group: node1 IPaddr::x.x.x.x/x IPaddr::x.x.x.x/x
IPaddr::x.x.x.x/x drbddisk::mysql
Filesystem::/dev/drbd0::/drbd::ext3::noatime
Dec 14 18:26:30 node2 IPaddr[5468]: [5561]: INFO:  Running OK
Dec 14 18:26:30 node2 heartbeat: [5406]: info: Local Resource
acquisition completed.
Dec 14 18:26:30 node2 IPaddr[5526]: [5579]: INFO:  Resource is stopped
Dec 14 18:26:30 node2 ResourceManager[5482]: [6076]: info: Running
/etc/ha.d/resource.d/drbddisk mysql start
Dec 14 18:26:30 node2 kernel: drbd0: helper command: /sbin/drbdadm outdate-peer
Dec 14 18:26:31 node2 /usr/lib64/heartbeat/dopd: [4780]: info:
send_message_to_the_peer: sending start_outdate message to the other
node node2 -> node1
Dec 14 18:26:49 node2 ResourceManager[5482]: [6110]: ERROR: Return
code 20 from /etc/ha.d/resource.d/drbddisk
Dec 14 18:26:49 node2 ResourceManager[5482]: [6111]: CRIT: Giving up
resources due to failure of drbddisk::mysql
Dec 14 18:26:50 node2 ResourceManager[5482]: [6112]: info: Releasing
resource group: node1 IPaddr::x.x.x.x/x IPaddr::x.x.x.x/x
IPaddr::x.x.x.x/x drbddisk::mysql
Filesystem::/dev/drbd0::/drbd::ext3::noatime
Dec 14 18:26:50 node2 ResourceManager[5482]: [6127]: info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd ext3 noatime stop
Dec 14 18:26:50 node2 Filesystem[6140]: [6169]: INFO: Running stop for
/dev/drbd0 on /drbd
Dec 14 18:26:50 node2 Filesystem[6129]: [6178]: INFO:  Success
Dec 14 18:26:50 node2 ResourceManager[5482]: [6193]: info: Running
/etc/ha.d/resource.d/drbddisk mysql stop