[DRBD-user] newbie drbd/HA configuration question

Mon Mar 27 00:09:37 CEST 2006

Well, this isn't a simply primary/failover, this is an active/active  
cluster, with one node "normally" being primary for some drbd devices  
and the other node "normally" being primary for the other devices.  
Hence my haresources file:

johnny drbddisk::web drbddisk::mail drbddisk::GRP drbddisk::data
cash drbddisk::db drbddisk::GRT drbddisk::GRD

I don't think it's splitbrain because I'm not seeing two nodes being  
primary or secondary for the same thing. When I yanked the network,  
each node both properly became primary for every drbd device. When I  
returned the network, they properly reverted back to being secondary  
on those DRBD devices they aren't normally supposed to be primary on.

Basically, it seems that things are working fine, except that for  
whatever reason *some* entries in /proc/drbd show that the mode of  
the other device is unknown:

SVN Revision: 2093 build by root at cash, 2006-03-25 11:06:23
0: cs:Connected st:Secondary/Primary ld:Consistent
     ns:0 nr:2038150 dw:2038150 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
1: cs:StandAlone st:Primary/Unknown ld:Consistent
     ns:0 nr:0 dw:4497525 dr:476448 al:2735 bm:2478 lo:0 pe:0 ua:0 ap:0
2: cs:Connected st:Secondary/Primary ld:Consistent
     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
3: cs:Connected st:Secondary/Primary ld:Consistent
     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
4: cs:StandAlone st:Secondary/Unknown ld:Consistent
     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
5: cs:Connected st:Primary/Secondary ld:Consistent
     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
6: cs:StandAlone st:Primary/Unknown ld:Consistent
     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0

My ha.cf:

logfacility     local0
keepalive 2
ucast br1 10.5.5.91
ucast br2 10.2.3.91
auto_failback on
node    johnny cash

The ucast arguments are different on each node, of course.

Is it the case that when I bring a DRBD resource back into a cluster,  
I need to make sure it thinks it's secondary first?

On Mar 26, 2006, at 1:23 PM, Gary W. Smith wrote:

> Ben,
>
> It looks like you could be seeing a split brain operation from
> heartbeat.  That is where both nodes think they own something.
>
> Pulling the network cable isn't the best approach to doing the  
> testing.
> Rather, instead of pulling the cables, gracefully shutdown  
> heartbeat on
> the primary node and see if it fails over fine.  Watch the secondary
> node at the same time and see if the resources are being owned  
> properly.
>
> Then you can do some more destructive tests like maybe pulling the  
> power
> code (not recommended for production environments but we do this as  
> part
> of our testing in dev).
>
> You should also post your ha.cf, haresources and your network config
> files so we can try to help out on those.  Everything else is just
> guesswork on my part.
>
> Gary Wayne Smith
>
>> -----Original Message-----
>> From: Ben [mailto:bench at silentmedia.com]
>> Sent: Sunday, March 26, 2006 12:48 PM
>> To: Gary W. Smith
>> Cc: drbd-user at lists.linbit.com
>> Subject: Re: [DRBD-user] newbie drbd/HA configuration question
>>
>> Thanks! That's exactly what I needed to know. So, my next question
>> is, after doing this, pulling the network cable out and watching
>> heartbeat perform as desired, plugging the network cable back in and
>> watching heartbeat perform as desired, I see that things didn't quite
>> return to normal.
>>
>> One node sees a DRBD disk in Primary/Unknown, and the other sees the
>> disk in Secondary/Unknown. I'm able to use the disk just fine on the
>> primary, but why isn't each node able to see the state of the other
>> node? And should I be worried?
>>
>