Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Well, this isn't a simply primary/failover, this is an active/active cluster, with one node "normally" being primary for some drbd devices and the other node "normally" being primary for the other devices. Hence my haresources file: johnny drbddisk::web drbddisk::mail drbddisk::GRP drbddisk::data cash drbddisk::db drbddisk::GRT drbddisk::GRD I don't think it's splitbrain because I'm not seeing two nodes being primary or secondary for the same thing. When I yanked the network, each node both properly became primary for every drbd device. When I returned the network, they properly reverted back to being secondary on those DRBD devices they aren't normally supposed to be primary on. Basically, it seems that things are working fine, except that for whatever reason *some* entries in /proc/drbd show that the mode of the other device is unknown: SVN Revision: 2093 build by root at cash, 2006-03-25 11:06:23 0: cs:Connected st:Secondary/Primary ld:Consistent ns:0 nr:2038150 dw:2038150 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 1: cs:StandAlone st:Primary/Unknown ld:Consistent ns:0 nr:0 dw:4497525 dr:476448 al:2735 bm:2478 lo:0 pe:0 ua:0 ap:0 2: cs:Connected st:Secondary/Primary ld:Consistent ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 3: cs:Connected st:Secondary/Primary ld:Consistent ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 4: cs:StandAlone st:Secondary/Unknown ld:Consistent ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 5: cs:Connected st:Primary/Secondary ld:Consistent ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 6: cs:StandAlone st:Primary/Unknown ld:Consistent ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 My ha.cf: logfacility local0 keepalive 2 ucast br1 10.5.5.91 ucast br2 10.2.3.91 auto_failback on node johnny cash The ucast arguments are different on each node, of course. Is it the case that when I bring a DRBD resource back into a cluster, I need to make sure it thinks it's secondary first? On Mar 26, 2006, at 1:23 PM, Gary W. Smith wrote: > Ben, > > It looks like you could be seeing a split brain operation from > heartbeat. That is where both nodes think they own something. > > Pulling the network cable isn't the best approach to doing the > testing. > Rather, instead of pulling the cables, gracefully shutdown > heartbeat on > the primary node and see if it fails over fine. Watch the secondary > node at the same time and see if the resources are being owned > properly. > > Then you can do some more destructive tests like maybe pulling the > power > code (not recommended for production environments but we do this as > part > of our testing in dev). > > You should also post your ha.cf, haresources and your network config > files so we can try to help out on those. Everything else is just > guesswork on my part. > > Gary Wayne Smith > >> -----Original Message----- >> From: Ben [mailto:bench at silentmedia.com] >> Sent: Sunday, March 26, 2006 12:48 PM >> To: Gary W. Smith >> Cc: drbd-user at lists.linbit.com >> Subject: Re: [DRBD-user] newbie drbd/HA configuration question >> >> Thanks! That's exactly what I needed to know. So, my next question >> is, after doing this, pulling the network cable out and watching >> heartbeat perform as desired, plugging the network cable back in and >> watching heartbeat perform as desired, I see that things didn't quite >> return to normal. >> >> One node sees a DRBD disk in Primary/Unknown, and the other sees the >> disk in Secondary/Unknown. I'm able to use the disk just fine on the >> primary, but why isn't each node able to see the state of the other >> node? And should I be worried? >> >