[DRBD-user] Question re. syncing

Lars Ellenberg Lars.Ellenberg at linbit.com
Tue Aug 15 13:58:29 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2006-08-15 10:52:19 +0800
\ Adrian Hicks:
> I've done some testing re. which machine becomes sync source/target and 
> haven't got myself a conclusive answer so feel it would be quicker to ask 
> a question here.
> 
> Not considering Heartbeat, how does drbd (0.7) decide which machine is the 
> sync source and which is the sync target?
> 
> Does making a node primary for a device = making that node the sync source 
> for that device?

no. but it prevents it from becoming sync target...
 "Current Primary shall become sync TARGET.
  Aborting to prevent data corruption."

> Let's say I boot both machines immediately after configuring drbd & the 
> drbd device is started as secondary, secondary.

well, there is more state to it.
it comes up as Secondary Inconsistent on both nodes.
Inconsistent meaning: no valid local data.

> When I make one node primary for the device, does this become the sync 
> source?

as long as it has not valid data (neither local nor remote), it
refuses to become primary.  so the first time you have to force it
primary (--do-what-I-say), implying the local copy of data is
consistent.

since after this force, you hat one valid data set (the one you
forced), and one not valid data set (the other one), the sync will
be from valid ("consistent") to invalid ("inconsistent") data.

in drbd 8 we therefore meanwhile renamed the option neccessary to
force it --overwrite-data-of-peer, which is more expressive.

> If so, is this always the case, ie. does making one node primary 
> for a drbd device dictate that that node is the sync source?

no. that has been the way in drbd 0.6 and before.
we decoupled that in drbd 0.7.

read more about the event counter based generation scheme here
 http://www.drbd.org/publications.html
  look for drbd_paper_for_NLUUG_2001
  section 6.3
(there should be a more recent thing about it somewhere, too,
 since the scheme got extended to have a "timeout" counter)     

this scheme did the trick in most situations, but has shortcomings
since you cannot assure that during split brain independent event
sequences on the not comunicating nodes may lead to generation
counter the algorithm cannot handle "correctly" on reconnection.

for drbd 8, when ever we may be modifying a data set independently
from our peer, we tag the data set generation with some uuid.

similar to the algorithm certain version control systems use,
by keeping a short history of previous tags, in theory[*] we can
detect whether the data sets to be "merged" on reconnection can be
automatically merged (one is a direct predecessor of the other, so
it becomse sync target), or they have been independently modified
(split brain detected) - these won't talk with each other unless
explicitly (by "auto-solve policy" configuration or commandline)
told which data set should be discarded.

a VCS would present you with a three way merge here;
we cannot do that, so you have to solve this manually.

[*] in fact, not only in theory. but our current implementation
may still not cover all corner cases involving disk failure while
writing to the meta data area correctly.

read some more about the concepts in drbd 8 in the
 drbd7_wpnr.pdf
 section 5
found on the same "publications" page.

cheers,

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list