[DRBD-user] Scheduled Replication Feasible?

Tue Feb 13 18:08:14 CET 2007

/ 2007-02-13 11:49:42 -0500
\ Ross S. W. Walker:
> > -----Original Message-----
> > From: drbd-user-bounces at lists.linbit.com 
> > [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Lars 
> > Ellenberg
> > Sent: Tuesday, February 13, 2007 8:47 AM
> > To: drbd-user at lists.linbit.com
> > Subject: Re: [DRBD-user] Scheduled Replication Feasible?
> > 
> > / 2007-02-12 12:59:13 -0500
> > \ Ross S. W. Walker:
> > > > > What is the down side to increasing al-extends, just longer 
> > > > re-syncs?
> > > > 
> > > > longer resyncs in case of primary crash.
> > > 
> > > Oh, so if the replica is disconnected and standalone, then when the
> > > secondary re-connects what happens then if the data written on the
> > > primary is > then al-extends?
> > 
> > there is probably a misconception about what those "al-extends" are.
> > 
> > we have a "dirty bitmap", which tracks, well, "dirty" blocks,
> > where dirty is: modified locally and not mirrored.
> > 
> > AND we have the activity log (the size of which gets tuned with the
> > al-extends parameter). this tracks _active_ extends, that is
> > extends to which io may be "on the fly".
> > 
> > in case we recover from a primary crash, we have no way to know
> > whether the "in flight" io has made it to (which) disk.
> > 
> > so for all extends covered by the activity log,
> > we flag the corresponding bits in the bitmap as "dirty".
> > 
> > upon resync, the bitmaps of the nodes get ORed together,
> > and the blocks corresponding to dirty bits get resynced.
> 
> Ok so the extends is the activity log, like a journal on a file-system,
> a crash is like an unclean dismount and during a reconnect the
> activity-log gets replayed between primary and secondary by means of
> ORing it with the bitmap of modified sectors/blocks/pages?
> 
> Good analogy?

not quite, but closer.
we "journal" only which extents (the extent numbers)
are in the "active set". we do not journal data.

> The extends "activity log" just contains the ios in process of being
> sync'd, it would mark it in the al-extends and then clear it from the
> bitmap. If a crash occurs mid-synchronization, then when it recovers it
> ORs the al-extends with the bitmap, it then should only need to update
> what was in the al-extends at the time of the crash and not the whole
> partition.
> 
> Is that accurate then?

please distinguish between "synchronisation" in the sense of a resync
when communication is restored after communication loss,
and "synchronization" in the sense of live replication/mirroring
which is normal drbd operation, and generally not called synchronization.

> > > > > Does all data covered in al-extends get re-synced?
> > > > 
> > > > after a primary crash :)
> > > 
> > > Only after a crash? What about after a disconnect and re-connect?
> > 
> > whenever necessary, we resync the changed blocks.
> > when you just disconnect/reconnect,
> > the activity log is not involved.
> 
> Ok, so a "crash" is defined as an active synchronization interrupted and

a crash is a Primary going down without
cleanly being demoted to Secondary before.
this has nothing to do with "active synchronization interrupted".

> NOT a disconnect, using the above file-system analogy, a disconnect is
> an unmount,

no. in the obove analogy, there is no "disconnect".
closest, still bad, analogy of a disconnect would be
one disk falling out of a raid1, where the raid1 would
need to have a similar concept of "dirty bitmap" to be used
when you later "hot-add" it again.

> a "crash" is either the server, drbd process/module or
> network unexpectantly terminating a synchronization.

no, see above.

> What will happen when the secondary disconnects for an extended period,
> say 12 hours, the primary switches to "standalone", and then we tell the
> primary to start accepting connections again and the secondary
> reconnects? Will it see that al-extends is clean, because the last
> scheduled synchronization completed cleanly and just walk the bitmap of
> dirty blocks?

yes. no "crash" as defined above, no al-extent relevant for sync.

> At what point will the primary decide, you know there is just too much
> data changed between us to reliably sync all these changes, so for
> consistency sake lets re-sync all data?

no. we allways only sync the bitmap.  if we want to have a "FullSync",
we first set all the bits, then do the normal resync.

> Does this happen,

no.

> or is the bitmap sufficient to mark each and every
> sector/block/page that has been modified?

yes.

> > > So we would tell heartbeat that if we are not primary to 
> > > not auto-start drbd, just hang out until further notice, for if it
> > > is scheduled replication we will be handling this through cron
> > > script. 
> > > Only if we are primary do we need to start drbd.
> > 
> > If I'd have scheduled replication,
> > I'd not involve heartbeat in any way.
> > 
> > I would feel very bad about automating a failover to stale data.
> 
> True, so in this instance drbd without heartbeat would be best.
> 
> I want to thank you for all the information you have provided so far it
> has been a tremendous help in planning out this deployment and hopefully
> it may even enlighten some list users in the process!
> 
> -Ross

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.