[DRBD-user] primary goes to standalone when secondary tries to connect

Lars Ellenberg Lars.Ellenberg at linbit.com
Wed Oct 25 10:03:44 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2006-10-24 14:19:53 -0700
\ Eric:
> I've just installed a drbd(0.7.21)/heartbeat setup for my ftp servers and I've ran into a problem.  When the 
> primary (hostA) goes offline, the secondary (hostB) promotes itself correctly and hostB now has a status of
>  0: cs:WFConnection st:Primary/Unknown ld:Consistent
>     ns:0 nr:56 dw:108 dr:265 al:0 bm:4 lo:0 pe:0 ua:0 ap:0
> Now when hostA comes back online and tries to make a connection, hostB changes its state to StandAlone.  After 
> doing a drbdadm adjust all on hostB, they reestablish a connection.  The status when hostA comes back online shows

the behaviour depends on exactly _how_ hostA "goes offline".
you probably need to get the order of your stop scripts right.

in you case, it seemingly drops the network connection
before it is switched to secondary state.
at least thats what the generation counters say.


so there is a period of time where hostA is Primary without
communicating with hostB.

then hostB is switched to Primary,
still without communicating with hostA.

next hostA reconnects.
if both had been Secondary at this point,
the sync had been from A -> B,
and depending on how long hostB had been Primary on its own,
you'd complain about a resync happening "in the wrong direction".
[[*]but see below].

Since hostB is still Primary at this time,
drbd detects the previous split brain:

hostB:
> drbd0: I am(P): 1:00000002:00000001:00000012:0000000a:10
> drbd0: Peer(S): 1:00000002:00000001:00000013:00000009:00
> drbd0: Current Primary shall become sync TARGET! Aborting to prevent data corruption.

now, when you reconnect again, meanwhile the generation counts are

> drbd0: I am(P): 1:00000002:00000001:00000013:0000000a:10
> drbd0: Peer(S): 1:00000002:00000001:00000013:00000009:00

(which is, depending on the point of view, a bug or feature),
and hostB (the current primary) wins.

[*]
in drbd8, we dropped that generation count scheme, since it cannot cope
with a variety of error scenarios, one of them being several independent
actions on both nodes while being unable to communicate.
instead, we uuid-tag data generations, so we can reliably detect
previous split-brain situations, and offer configurable auto-resolve
strategies.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list