Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-04-27 20:08:44 +0400 \ Eugene Crosser: > I am putting this on the maillist because I think the answers sould be > interesting to everybody. > > On Tue, 2004-04-27 at 15:33, Lars Ellenberg wrote: > > > > nfsa2 was working primary, with freshly run fsck and quotacheck. At the > > > moment when secondary came up and started synchronizing VFS error > > > messages began appearing in the log. And yes, fsck and quotacheck find > > > plenty of errors on such filesystem. > > > > > > Apr 27 12:03:09 nfsa1.mail.back kernel: drbd0: size = 214165504 KB > > > Apr 27 12:03:10 nfsa1.mail.back kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map. > > > Apr 27 12:03:10 nfsa1.mail.back kernel: drbd0: Found 6 transactions (324 active extents) in activity log. > > > Apr 27 12:03:10 nfsa1.mail.back kernel: drbd0: Marked additional 1052672 KB as out-of-sync based on AL. > > > Apr 27 12:03:10 nfsa1.mail.back kernel: drbd0: Connection established. > > > Apr 27 12:03:10 nfsa2.mail.back kernel: drbd0: Connection established. > > > Apr 27 12:03:10 nfsa2.mail.back kernel: drbd0: Resync started as target (need to sync 2450904 KB). > > > Apr 27 12:03:10 nfsa1.mail.back kernel: drbd0: Resync started as source (need to sync 2450904 KB). > > > > this is the problem. > > see, you say you have the nfsa2 primary. > > but on connection, DRBD decides that the primary node shall become sync > > *TARGET*, so it gets overwritten with the BAD data from nfsa1 UNDERNEATH > > the file system, which obviously corrupts the contents :( > > So, how exactly drbd decides which node to make syncsource and which > synctarget? In 0.6 it was pretty easy: primary is source. no. not exactly. comparing meta data generation counters is the same as in 0.6, only that *after* the decision, 0.6 would try to promote the sync source to primary (reagrdless of any cluster manager. thats why you needed to wait for the sync before starting heartbeat!), and if that was not possible, because the sync *target* *WAS* already primary, you got this "predetermined states are in contradiction to GC's" message. in 0.7, node state (role/ secondary/primary) has been decoupled from sync (so you can have a Primary Sync Target), this is intended to enable heartbeat to promote a sync target secondary to primary. but obviously there is some check missing that prevents a running standalone primary to become sync target on connect (after e.g. a split brain situation). > Is it wise at all to allow primary be a target? we thought it would make interaction with heartbeat easier. > Will the application be happy if the data is magically modified? if we do it right, yes; it won't notice. > How to avoid problems? do it right :) Lars Ellenberg