Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Little more info please. (drbd.conf, ha.cf and connection path info) If you are using "fencing resource-only;" try commenting out and unplugging the power. That would at least help you narrow down where the problem is. This one caused similar problems for me. On Fri, 2007-12-14 at 10:52 -0800, Art Age Software wrote: > Hi all, > > I've been working on setting up a two-node HA cluster with Heartbeat + DRBD. > Everything has been going well and now I am in the testing phase. > All initial tests looked good. Manual failover via heartbeat commands > works as expected. > Rebooting the machines works as expected with resources transitioning correctly. > Finally, it was time for the biggest test of all...physically pulling > power on the pirmary. > Now, this is a painful thing to have to do - but of course necessary. > So, naturally I had hoped it would "just work" on the first try. > Unfortunately, it did not. DRBD became primary on node2, but the file > system never mounted. > >From looking at the logs, DRBD on node2 seems to have aborted > Heartbeat's attempt at takeover. I have copied a relvant portion of > the log below. > Can anybody offer any insight into what went wrong? Clearly, it isn't > a highly available system if node2 won't take over in the event that > node1 disappears. > > DRBD version is 8.0.6. > > Thanks, > > Sam > > Dec 14 18:26:30 node2 harc[5405]: [5418]: info: Running > /etc/ha.d/rc.d/status status > Dec 14 18:26:30 node2 mach_down[5431]: [5480]: info: Taking over > resource group IPaddr::x.x.x.x/x > Dec 14 18:26:30 node2 ResourceManager[5482]: [5509]: info: Acquiring > resource group: node1 IPaddr::x.x.x.x/x IPaddr::x.x.x.x/x > IPaddr::x.x.x.x/x drbddisk::mysql > Filesystem::/dev/drbd0::/drbd::ext3::noatime > Dec 14 18:26:30 node2 IPaddr[5468]: [5561]: INFO: Running OK > Dec 14 18:26:30 node2 heartbeat: [5406]: info: Local Resource > acquisition completed. > Dec 14 18:26:30 node2 IPaddr[5526]: [5579]: INFO: Resource is stopped > Dec 14 18:26:30 node2 ResourceManager[5482]: [6076]: info: Running > /etc/ha.d/resource.d/drbddisk mysql start > Dec 14 18:26:30 node2 kernel: drbd0: helper command: /sbin/drbdadm outdate-peer > Dec 14 18:26:31 node2 /usr/lib64/heartbeat/dopd: [4780]: info: > send_message_to_the_peer: sending start_outdate message to the other > node node2 -> node1 > Dec 14 18:26:49 node2 ResourceManager[5482]: [6110]: ERROR: Return > code 20 from /etc/ha.d/resource.d/drbddisk > Dec 14 18:26:49 node2 ResourceManager[5482]: [6111]: CRIT: Giving up > resources due to failure of drbddisk::mysql > Dec 14 18:26:50 node2 ResourceManager[5482]: [6112]: info: Releasing > resource group: node1 IPaddr::x.x.x.x/x IPaddr::x.x.x.x/x > IPaddr::x.x.x.x/x drbddisk::mysql > Filesystem::/dev/drbd0::/drbd::ext3::noatime > Dec 14 18:26:50 node2 ResourceManager[5482]: [6127]: info: Running > /etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd ext3 noatime stop > Dec 14 18:26:50 node2 Filesystem[6140]: [6169]: INFO: Running stop for > /dev/drbd0 on /drbd > Dec 14 18:26:50 node2 Filesystem[6129]: [6178]: INFO: Success > Dec 14 18:26:50 node2 ResourceManager[5482]: [6193]: info: Running > /etc/ha.d/resource.d/drbddisk mysql stop > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user