Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Lars, Turns out, I had to take down drbd on the secondary and completely resync the partitions before I could get them to see each other. Not sure how I got into that situation though. So, I now have a drbd master/slave with a filesystem resource following it around via heartbeat. I have noticed one unusual aspect, which I wanted to ask you about. Right now, the filesystem resource mounts fine on one node but not the other due to an earlier error in the switching. I believe I can correct the situation, but I noticed that the drbd_promote function checks "are we already primary (local)?" then return success, and "we're we not secondary (local)? then return generic error. It gets through to the drbdadm primary command only if we are truly secondary. However, there is no check to make sure the remote is not still primary. What I'm seeing happening is a promote call where the secondary node tried to go primary, but the primary node is still primary (not sure if it just never got the secondary command or if its a timing issue that it just hasn't completed switching from primary to secondary). This triggers an error: "State change failed: (-1) Multiple primaries not allowed by config". The return from the command is 0, even though the return from within drbdadm from drbdsetup is 11. This 0 propagates back out to drbd_promote, and gets returned to heartbeat as Success, so heartbeat assumes its OK to proceed with the Filesystem mount (which of course fails). See error snippet below: drbd[9059]: 2007/04/25_08:44:22 DEBUG: pgsql: Calling /sbin/drbdadm -c /etc/drbd.conf primary pgsql drbd[9059]: 2007/04/25_08:44:23 DEBUG: pgsql: Exit code 0 drbd[9059]: 2007/04/25_08:44:23 DEBUG: pgsql: Command output: State change failed: (-1) Multiple primaries not allowed by config Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11 lrmd[1171]: 2007/04/25_08:44:23 info: RA output: (rsc_drbd_7788:0:promote:stdout) State change failed: (-1) Multiple primaries not allowed by config Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with exit code 11 crmd[1176]: 2007/04/25_08:44:23 info: process_lrm_event: LRM operation rsc_drbd_7788:0_promote_0 (call=50, rc=0) complete What I propose to do is to put a conditional wait loop in the promote function that checks the remote status, and if its Primary then sleep and loop until the remote is not Primary, or terminate after so many loops. I think this provides a safer promote, though it doesn't address the issue shown by the log messages above. Thoughts? Doug On Tue, 2007-04-24 at 12:32 -0400, Doug Knight wrote: > Hi Lars, > I think I'm pretty close on getting the filesystem following drbd, but > I've encountered a strange situation. drbd is running on both nodes, > one indicating Secondary from the /proc/drbd file and Slave in > heartbeat, the other Master and Primary. However, both show the other > as Unknown. I'm able to ping successfully over the dedicated CAT6 > cable between the nodes. Any idea why the two drbd processes wouldn't > see each other? > > Doug > On Tue, 2007-04-24 at 08:00 -0400, Doug Knight wrote: > > > On Tue, 2007-04-24 at 11:07 +0200, Lars Ellenberg wrote: > > > > > On Mon, Apr 23, 2007 at 02:39:16PM -0400, Doug Knight wrote: > > > > Hey Lars, > > > > > > please read the last line of my sig... :) > > > > > > > Sorry, I was jumping between email lists, and some of the others > > come in with a reply-to already set to the list, not the > > individual... > > > > > > > > Quick question: Comparing the printf text in your patch to the various > > > > checks in the drbd ocf script, should the patch be printf'ing "Not > > > > configured" as the script is checking for (DRBD_STATE_LOCAL checks)? If > > > > these should match, then I'd actually prefer to change the checks in the > > > > ocf script to match your patch (I like not having the embedded blank). > > > > > > > > Doug > > > > > > I think the 0.7 drbdsetup reported "Not configured". > > > So maybe we should stick with that? > > > > > > but since scripts tailored for 0.7 are likely to be broken for 8.0 > > > anyways, we could also change this to "Unconfigured". > > > I'm not sure yet, but I'm more for consistency with the /proc/drbd > > > output, so its probably going to be "Unconfigured". > > > > > > > > > > Good, I have already changed my script to look for Unconfigured, and > > now I can switch drbd back and forth between nodes using a Place > > constraint with no problem. Now, I'm working on tying the filesystem > > resource to it. I've got a couple of questions up on the HA-Linux > > list on that. There is so much overlap between the HA and the drbd > > work that sometimes its hard to tell which list I should send to (or > > just keeping track where I've asked what ;) > > > > Thanks! > > > > _______________________________________________ > > drbd-user mailing list > > drbd-user at lists.linbit.com > > http://lists.linbit.com/mailman/listinfo/drbd-user > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070425/aa5617d8/attachment.htm>