[DRBD-user] database on drbd

Thu Apr 1 08:40:34 CEST 2004

/ 2004-03-31 19:27:52 -0500
\ Todd Denniston:
> Lars Ellenberg wrote:
> > 
> > / 2004-03-05 06:37:46 -0800
> > \ abhin.g.s @ RASC:
> > > hi,
> <SNIP>
> > >   i am finding a small trouble with the drbd configuration i am
> > > having, if i shutdown the nodes with shutdown -h 0, one at a
> > > time and reboot at next time, the secondary node will take
> > > around 4 min to sync up. is it normal or any problem?
> > 
> > Best you move both to secondary state while they are still
> > connected, then shut down the network and everything else.
> > 
> > Probably you have to tweak a little bit the order of your
> > init.d/*/* resource scripts.
> > 
> >         Lars Ellenberg
> 
> As I think I am seeing the result of this, it bit me when I took both nodes
> down yesterday, please tell me if this interpretation is correct.
> 
> node paul, drbd primary and heartbeat active node exporting drives.
> node saul, drbd secondary and ignoring the rest of the world but paul.
> 
> on saul, issue `shutdown -h now`
> wait for saul to power down fully.
> write ~6Mb to a drbd controlled filesystem.
> on paul, issue `shutdown -h now`
> wait for paul to power down fully.
> 
> boot paul, allow it to start loading the kernel (i.e. a large head start) then
> boot saul.
> 
> expected result (from my brain): SyncingQuick
> only a little bit changed on paul, and saul was never in primary after it or
> paul went down, AND paul got to come back first.
> 
> experienced result (cat /proc/drbd): SyncingAll
> 
> so both nodes have to be in secondary (and both know it) before either goes
> down, and stay that way until repowered.
> 
> 
>  Correct???

Correct from concept.
BUT: unfortunately not implemented (in 0.6.x) :( 

The 0.6.X series does NOT yet have a persistent
bitmap, which is needed for the "Quick" sync.
So after the reboot, there is no way for the nodes to *know* that only
alittle bit has changed. The only thing they know is that one was
primary, without seeing the other one, and then went down.

So in 0.6.x we need a full synchronization
 * whenever a connected Primary node went down (crash)
 * whenever a UNconnected Primary node changed state

Or, to put it the other way, we ONLY have a "Quick" sync,
if the current Primary stays up, i.e. Secondary fails an rejoins,
or the much more likely case of a replication network hickup.

In 0.7 we have not only a persistent bitmap, but also an "activity log".
So this situation dramatically improves.
We are currently in the process of deploing an automatic test harness
driven beat-all-remaining-bugs-out-of-it cluster ...
But it will still need some time to have 0.7 production quality.

	Lars Ellenberg