Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> I think I'm starting to get my mind around why and how dopd but > currently when the primary node goes down (pull the plug) the secondary > node is getting the message to outdate the drbd resource. Here is where > I think it is happening on node2 after I pull the plug on node1: > ------------------------------------------------------------------- > Dec 4 13:22:13 svr92 kernel: drbd0: helper command: /sbin/drbdadm > outdate-peer > Dec 4 13:22:13 svr92 kernel: drbd0: disk( UpToDate -> Outdated ) > Dec 4 13:22:13 svr92 kernel: drbd0: outdate-peer helper broken, > returned 255 > Dec 4 13:22:13 svr92 kernel: drbd0: State change failed: Refusing to be > Primary without at least one UpToDate disk > ------------------------------------------------------------------- > I can't find any reference to what "returned 255" means but the > outdate-peer appears to be broken??? I see this as well right now (as you may have noticed from my thread about resource fencing). See if you can start drbd-peer-outdater -p <peername> -r <resourcename> by hand (while heartbeat and dopd are running). On my testsystem, this segfaults and returns 255 which might indicate that when it is run by drbd (or heartbeat?), segfaults too. > So . . . node1 goes down and somehow in the process outdates node2's > resource so heartbeat can't bring it up. There goes my redundancy BUT > if node1 really is dead, how can I undo the outdate flag on node2 so I > can bring that node up as the primary until I can fix node1? You can always do "drbdadm -- --overwrite-data-of-peer primary <resourcename>". This will bring your outdated resource into primary state. But you don't want that to happen automatically. Oh and btw: This is also true for the test-version 2.1.3 of heartbeat which was announced for testing yesterday and is scheduled for Dec, 19th. It would be nice if it would get fixed for 2.1.3. Regards Dominik