Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, I've been working on setting up a two-node HA cluster with Heartbeat + DRBD. Everything has been going well and now I am in the testing phase. All initial tests looked good. Manual failover via heartbeat commands works as expected. Rebooting the machines works as expected with resources transitioning correctly. Finally, it was time for the biggest test of all...physically pulling power on the pirmary. Now, this is a painful thing to have to do - but of course necessary. So, naturally I had hoped it would "just work" on the first try. Unfortunately, it did not. DRBD became primary on node2, but the file system never mounted. >From looking at the logs, DRBD on node2 seems to have aborted Heartbeat's attempt at takeover. I have copied a relvant portion of the log below. Can anybody offer any insight into what went wrong? Clearly, it isn't a highly available system if node2 won't take over in the event that node1 disappears. DRBD version is 8.0.6. Thanks, Sam Dec 14 18:26:30 node2 harc[5405]: [5418]: info: Running /etc/ha.d/rc.d/status status Dec 14 18:26:30 node2 mach_down[5431]: [5480]: info: Taking over resource group IPaddr::x.x.x.x/x Dec 14 18:26:30 node2 ResourceManager[5482]: [5509]: info: Acquiring resource group: node1 IPaddr::x.x.x.x/x IPaddr::x.x.x.x/x IPaddr::x.x.x.x/x drbddisk::mysql Filesystem::/dev/drbd0::/drbd::ext3::noatime Dec 14 18:26:30 node2 IPaddr[5468]: [5561]: INFO: Running OK Dec 14 18:26:30 node2 heartbeat: [5406]: info: Local Resource acquisition completed. Dec 14 18:26:30 node2 IPaddr[5526]: [5579]: INFO: Resource is stopped Dec 14 18:26:30 node2 ResourceManager[5482]: [6076]: info: Running /etc/ha.d/resource.d/drbddisk mysql start Dec 14 18:26:30 node2 kernel: drbd0: helper command: /sbin/drbdadm outdate-peer Dec 14 18:26:31 node2 /usr/lib64/heartbeat/dopd: [4780]: info: send_message_to_the_peer: sending start_outdate message to the other node node2 -> node1 Dec 14 18:26:49 node2 ResourceManager[5482]: [6110]: ERROR: Return code 20 from /etc/ha.d/resource.d/drbddisk Dec 14 18:26:49 node2 ResourceManager[5482]: [6111]: CRIT: Giving up resources due to failure of drbddisk::mysql Dec 14 18:26:50 node2 ResourceManager[5482]: [6112]: info: Releasing resource group: node1 IPaddr::x.x.x.x/x IPaddr::x.x.x.x/x IPaddr::x.x.x.x/x drbddisk::mysql Filesystem::/dev/drbd0::/drbd::ext3::noatime Dec 14 18:26:50 node2 ResourceManager[5482]: [6127]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd ext3 noatime stop Dec 14 18:26:50 node2 Filesystem[6140]: [6169]: INFO: Running stop for /dev/drbd0 on /drbd Dec 14 18:26:50 node2 Filesystem[6129]: [6178]: INFO: Success Dec 14 18:26:50 node2 ResourceManager[5482]: [6193]: info: Running /etc/ha.d/resource.d/drbddisk mysql stop