Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-03-31 19:27:52 -0500 \ Todd Denniston: > Lars Ellenberg wrote: > > > > / 2004-03-05 06:37:46 -0800 > > \ abhin.g.s @ RASC: > > > hi, > <SNIP> > > > i am finding a small trouble with the drbd configuration i am > > > having, if i shutdown the nodes with shutdown -h 0, one at a > > > time and reboot at next time, the secondary node will take > > > around 4 min to sync up. is it normal or any problem? > > > > Best you move both to secondary state while they are still > > connected, then shut down the network and everything else. > > > > Probably you have to tweak a little bit the order of your > > init.d/*/* resource scripts. > > > > Lars Ellenberg > > As I think I am seeing the result of this, it bit me when I took both nodes > down yesterday, please tell me if this interpretation is correct. > > node paul, drbd primary and heartbeat active node exporting drives. > node saul, drbd secondary and ignoring the rest of the world but paul. > > on saul, issue `shutdown -h now` > wait for saul to power down fully. > write ~6Mb to a drbd controlled filesystem. > on paul, issue `shutdown -h now` > wait for paul to power down fully. > > boot paul, allow it to start loading the kernel (i.e. a large head start) then > boot saul. > > expected result (from my brain): SyncingQuick > only a little bit changed on paul, and saul was never in primary after it or > paul went down, AND paul got to come back first. > > experienced result (cat /proc/drbd): SyncingAll > > so both nodes have to be in secondary (and both know it) before either goes > down, and stay that way until repowered. > > > Correct??? Correct from concept. BUT: unfortunately not implemented (in 0.6.x) :( The 0.6.X series does NOT yet have a persistent bitmap, which is needed for the "Quick" sync. So after the reboot, there is no way for the nodes to *know* that only alittle bit has changed. The only thing they know is that one was primary, without seeing the other one, and then went down. So in 0.6.x we need a full synchronization * whenever a connected Primary node went down (crash) * whenever a UNconnected Primary node changed state Or, to put it the other way, we ONLY have a "Quick" sync, if the current Primary stays up, i.e. Secondary fails an rejoins, or the much more likely case of a replication network hickup. In 0.7 we have not only a persistent bitmap, but also an "activity log". So this situation dramatically improves. We are currently in the process of deploing an automatic test harness driven beat-all-remaining-bugs-out-of-it cluster ... But it will still need some time to have 0.7 production quality. Lars Ellenberg