[DRBD-user] primary goes to standalone when secondary tries to connect

Wed Oct 25 22:10:48 CEST 2006

Thanks Lars,
   I was trying to test by using my normal shutdown scripts on the primary which 
shut down my network interface but never attempted to cleanly shutdown drbd so 
as you stated, it continued to act as the primary after communication between 
the servers failed.  So I just modified my shutdown scripts to cleanly stop drbd 
before it turns off the network interface and it works great now.

Lars Ellenberg wrote:
> / 2006-10-24 14:19:53 -0700
> \ Eric:
>> I've just installed a drbd(0.7.21)/heartbeat setup for my ftp servers and I've ran into a problem.  When the 
>> primary (hostA) goes offline, the secondary (hostB) promotes itself correctly and hostB now has a status of
>>  0: cs:WFConnection st:Primary/Unknown ld:Consistent
>>     ns:0 nr:56 dw:108 dr:265 al:0 bm:4 lo:0 pe:0 ua:0 ap:0
>> Now when hostA comes back online and tries to make a connection, hostB changes its state to StandAlone.  After 
>> doing a drbdadm adjust all on hostB, they reestablish a connection.  The status when hostA comes back online shows
> 
> the behaviour depends on exactly _how_ hostA "goes offline".
> you probably need to get the order of your stop scripts right.
> 
> in you case, it seemingly drops the network connection
> before it is switched to secondary state.
> at least thats what the generation counters say.
> 
> 
> so there is a period of time where hostA is Primary without
> communicating with hostB.
> 
> then hostB is switched to Primary,
> still without communicating with hostA.
> 
> next hostA reconnects.
> if both had been Secondary at this point,
> the sync had been from A -> B,
> and depending on how long hostB had been Primary on its own,
> you'd complain about a resync happening "in the wrong direction".
> [[*]but see below].
> 
> Since hostB is still Primary at this time,
> drbd detects the previous split brain:
> 
> hostB:
>> drbd0: I am(P): 1:00000002:00000001:00000012:0000000a:10
>> drbd0: Peer(S): 1:00000002:00000001:00000013:00000009:00
>> drbd0: Current Primary shall become sync TARGET! Aborting to prevent data corruption.
> 
> now, when you reconnect again, meanwhile the generation counts are
> 
>> drbd0: I am(P): 1:00000002:00000001:00000013:0000000a:10
>> drbd0: Peer(S): 1:00000002:00000001:00000013:00000009:00
> 
> (which is, depending on the point of view, a bug or feature),
> and hostB (the current primary) wins.
> 
> [*]
> in drbd8, we dropped that generation count scheme, since it cannot cope
> with a variety of error scenarios, one of them being several independent
> actions on both nodes while being unable to communicate.
> instead, we uuid-tag data generations, so we can reliably detect
> previous split-brain situations, and offer configurable auto-resolve
> strategies.
>