Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2007-02-13 11:49:42 -0500 \ Ross S. W. Walker: > > -----Original Message----- > > From: drbd-user-bounces at lists.linbit.com > > [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Lars > > Ellenberg > > Sent: Tuesday, February 13, 2007 8:47 AM > > To: drbd-user at lists.linbit.com > > Subject: Re: [DRBD-user] Scheduled Replication Feasible? > > > > / 2007-02-12 12:59:13 -0500 > > \ Ross S. W. Walker: > > > > > What is the down side to increasing al-extends, just longer > > > > re-syncs? > > > > > > > > longer resyncs in case of primary crash. > > > > > > Oh, so if the replica is disconnected and standalone, then when the > > > secondary re-connects what happens then if the data written on the > > > primary is > then al-extends? > > > > there is probably a misconception about what those "al-extends" are. > > > > we have a "dirty bitmap", which tracks, well, "dirty" blocks, > > where dirty is: modified locally and not mirrored. > > > > AND we have the activity log (the size of which gets tuned with the > > al-extends parameter). this tracks _active_ extends, that is > > extends to which io may be "on the fly". > > > > in case we recover from a primary crash, we have no way to know > > whether the "in flight" io has made it to (which) disk. > > > > so for all extends covered by the activity log, > > we flag the corresponding bits in the bitmap as "dirty". > > > > upon resync, the bitmaps of the nodes get ORed together, > > and the blocks corresponding to dirty bits get resynced. > > Ok so the extends is the activity log, like a journal on a file-system, > a crash is like an unclean dismount and during a reconnect the > activity-log gets replayed between primary and secondary by means of > ORing it with the bitmap of modified sectors/blocks/pages? > > Good analogy? not quite, but closer. we "journal" only which extents (the extent numbers) are in the "active set". we do not journal data. > The extends "activity log" just contains the ios in process of being > sync'd, it would mark it in the al-extends and then clear it from the > bitmap. If a crash occurs mid-synchronization, then when it recovers it > ORs the al-extends with the bitmap, it then should only need to update > what was in the al-extends at the time of the crash and not the whole > partition. > > Is that accurate then? please distinguish between "synchronisation" in the sense of a resync when communication is restored after communication loss, and "synchronization" in the sense of live replication/mirroring which is normal drbd operation, and generally not called synchronization. > > > > > Does all data covered in al-extends get re-synced? > > > > > > > > after a primary crash :) > > > > > > Only after a crash? What about after a disconnect and re-connect? > > > > whenever necessary, we resync the changed blocks. > > when you just disconnect/reconnect, > > the activity log is not involved. > > Ok, so a "crash" is defined as an active synchronization interrupted and a crash is a Primary going down without cleanly being demoted to Secondary before. this has nothing to do with "active synchronization interrupted". > NOT a disconnect, using the above file-system analogy, a disconnect is > an unmount, no. in the obove analogy, there is no "disconnect". closest, still bad, analogy of a disconnect would be one disk falling out of a raid1, where the raid1 would need to have a similar concept of "dirty bitmap" to be used when you later "hot-add" it again. > a "crash" is either the server, drbd process/module or > network unexpectantly terminating a synchronization. no, see above. > What will happen when the secondary disconnects for an extended period, > say 12 hours, the primary switches to "standalone", and then we tell the > primary to start accepting connections again and the secondary > reconnects? Will it see that al-extends is clean, because the last > scheduled synchronization completed cleanly and just walk the bitmap of > dirty blocks? yes. no "crash" as defined above, no al-extent relevant for sync. > At what point will the primary decide, you know there is just too much > data changed between us to reliably sync all these changes, so for > consistency sake lets re-sync all data? no. we allways only sync the bitmap. if we want to have a "FullSync", we first set all the bits, then do the normal resync. > Does this happen, no. > or is the bitmap sufficient to mark each and every > sector/block/page that has been modified? yes. > > > So we would tell heartbeat that if we are not primary to > > > not auto-start drbd, just hang out until further notice, for if it > > > is scheduled replication we will be handling this through cron > > > script. > > > Only if we are primary do we need to start drbd. > > > > If I'd have scheduled replication, > > I'd not involve heartbeat in any way. > > > > I would feel very bad about automating a failover to stale data. > > True, so in this instance drbd without heartbeat would be best. > > I want to thank you for all the information you have provided so far it > has been a tremendous help in planning out this deployment and hopefully > it may even enlighten some list users in the process! > > -Ross -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.