Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
OK . . . Back tracking a little here. Still would like to know where I can get some dopd doc’s but apparently “drbdadm outdate all” doesn’t work unless the dopd nodes are disconnected. I tried disconnecting and it worked. Upon closer inspection of the dopd syslog error it appears that it’s looking for drbdadm in /sbin and my distro has it in /usr/sbin. I tried copying and ln -s drbdadm, drbdsetup and drbdmeta to /sbin but neither option works and now the log shows unknown exit code from /sbin/drbdadm outdate all: 126 instead of unknown exit code from /sbin/drbdadm outdate all: 127 Any clues to where the path problem is? Thanx On Thu, 2007-12-06 at 09:28 -0800, Rois Cannon wrote: > After trying Dominik's suggestion of running the drbd-peer-outdater > manually, I've come to believe that dopd is NOT working on my machine. > > Just for giggles I took out the dopd stuff > drbd.conf: > outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater > ha.cf: > respawn hacluster /usr/lib/heartbeat/dopd > apiauth dopd gid=haclient uid=hacluster > but left > drbd.conf: > fencing resource-only > > and ran the same test where I pull the plug on node1. Node2 became > outdated and so heartbeat would not bring it up. This suggests to me > that the dopd stuff wasn't doing anything. > > On node2 (when everything was uptodate and running) I then tried running > what the log says drbd-peer-outdater is doing: > drbdadm outdate all > > This command gave no errors and did not outdate the resources on node2. > I was able to restart heartbeat on node1 and node2 took over the > resources as I would expect if drbd was uptodate(as it was still marked > after I tried outdating it manually.) > > Florian (or others), do you have any ideas on what my problem is with > drbdadm outdate all. I could use some suggestions on how to debug this. > > Thanx > Rois > > > > > On Thu, 2007-12-06 at 08:38 -0800, Rois Cannon wrote: > > See below. Thoughts??? -Rois > > > > On Thu, 2007-12-06 at 10:38 +0100, Dominik Klein wrote: > > > > I think I'm starting to get my mind around why and how dopd but > > > > currently when the primary node goes down (pull the plug) the secondary > > > > node is getting the message to outdate the drbd resource. Here is where > > > > I think it is happening on node2 after I pull the plug on node1: > > > > ------------------------------------------------------------------- > > > > Dec 4 13:22:13 svr92 kernel: drbd0: helper command: /sbin/drbdadm > > > > outdate-peer > > > > Dec 4 13:22:13 svr92 kernel: drbd0: disk( UpToDate -> Outdated ) > > > > Dec 4 13:22:13 svr92 kernel: drbd0: outdate-peer helper broken, > > > > returned 255 > > > > Dec 4 13:22:13 svr92 kernel: drbd0: State change failed: Refusing to be > > > > Primary without at least one UpToDate disk > > > > ------------------------------------------------------------------- > > > > I can't find any reference to what "returned 255" means but the > > > > outdate-peer appears to be broken??? > > > > > > I see this as well right now (as you may have noticed from my thread > > > about resource fencing). See if you can start drbd-peer-outdater -p > > > <peername> -r <resourcename> by hand (while heartbeat and dopd are running). > > > > > > On my testsystem, this segfaults and returns 255 which might indicate > > > that when it is run by drbd (or heartbeat?), segfaults too. > > > > > > > When I run > > /usr/lib/heartbeat/drbd-peer-outdater -p svr92 -r all > > from the primary node (node1) I get no errors on node1 but I get this > > message in the log on node2 (svr92 is node2): > > ------------------------------------------------------------------------- > > Dec 6 08:21:39 svr92 /usr/lib/heartbeat/dopd: [4749]: info: > > msg_start_outdate: unknown > > exit code from /sbin/drbdadm outdate all: 127 > > Dec 6 08:21:39 svr92 /usr/lib/heartbeat/dopd: [4749]: info: > > msg_start_outdate: sending > > return code: 5, svr92 -> svr91 > > ------------------------------------------------------------------------- > > > > Same thing in the opposite direction. > > > > > > So . . . node1 goes down and somehow in the process outdates node2's > > > > resource so heartbeat can't bring it up. There goes my redundancy BUT > > > > if node1 really is dead, how can I undo the outdate flag on node2 so I > > > > can bring that node up as the primary until I can fix node1? > > > > > > You can always do "drbdadm -- --overwrite-data-of-peer primary > > > <resourcename>". This will bring your outdated resource into primary > > > state. But you don't want that to happen automatically. > > > > > > > No joy on overwrite-data-of-peer but I did get it to come up when I ran: > > drbdsetup all primary -o > > on node2 (while node1 was still off) and that brought it back to life as > > uptodate. > > > > > > > Oh and btw: This is also true for the test-version 2.1.3 of heartbeat > > > which was announced for testing yesterday and is scheduled for Dec, > > > 19th. It would be nice if it would get fixed for 2.1.3. > > > > > > Regards > > > Dominik > > > _______________________________________________ > > > drbd-user mailing list > > > drbd-user at lists.linbit.com > > > http://lists.linbit.com/mailman/listinfo/drbd-user > > > > _______________________________________________ > > drbd-user mailing list > > drbd-user at lists.linbit.com > > http://lists.linbit.com/mailman/listinfo/drbd-user