[DRBD-user] Re: Another log - primary corruption when secondary comes up

Lars Ellenberg Lars.Ellenberg at linbit.com
Tue Apr 27 18:40:11 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2004-04-27 20:08:44 +0400
\ Eugene Crosser:
> I am putting this on the maillist because I think the answers sould be
> interesting to everybody.
> 
> On Tue, 2004-04-27 at 15:33, Lars Ellenberg wrote:
> 
> > > nfsa2 was working primary, with freshly run fsck and quotacheck.  At the
> > > moment when secondary came up and started synchronizing VFS error
> > > messages began appearing in the log.  And yes, fsck and quotacheck find
> > > plenty of errors on such filesystem.
> > > 
> > > Apr 27 12:03:09 nfsa1.mail.back kernel: drbd0: size = 214165504 KB
> > > Apr 27 12:03:10 nfsa1.mail.back kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map.
> > > Apr 27 12:03:10 nfsa1.mail.back kernel: drbd0: Found 6 transactions (324 active extents) in activity log.
> > > Apr 27 12:03:10 nfsa1.mail.back kernel: drbd0: Marked additional 1052672 KB as out-of-sync based on AL.
> > > Apr 27 12:03:10 nfsa1.mail.back kernel: drbd0: Connection established.
> > > Apr 27 12:03:10 nfsa2.mail.back kernel: drbd0: Connection established.
> > > Apr 27 12:03:10 nfsa2.mail.back kernel: drbd0: Resync started as target (need to sync 2450904 KB).
> > > Apr 27 12:03:10 nfsa1.mail.back kernel: drbd0: Resync started as source (need to sync 2450904 KB).
> > 
> > this is the problem.
> > see, you say you have the nfsa2 primary.
> > but on connection, DRBD decides that the primary node shall become sync
> > *TARGET*, so it gets overwritten with the BAD data from nfsa1 UNDERNEATH
> > the file system, which obviously corrupts the contents :(
> 
> So, how exactly drbd decides which node to make syncsource and which
> synctarget?  In 0.6 it was pretty easy: primary is source.

no. not exactly.
comparing meta data generation counters is the same as in 0.6,
only that *after* the decision, 0.6 would try to promote the sync source
to primary (reagrdless of any cluster manager. thats why you needed to
wait for the sync before starting heartbeat!), and if that was not
possible, because the sync *target*  *WAS* already primary, you got this
"predetermined states are in contradiction to GC's" message.

in 0.7, node state (role/ secondary/primary) has been decoupled from
sync (so you can have a Primary Sync Target), this is intended to enable
heartbeat to promote a sync target secondary to primary.

but obviously there is some check missing that prevents a running
standalone primary to become sync target on connect (after e.g. a split
brain situation).


> Is it wise at all to allow primary be a target?

we thought it would make interaction with heartbeat easier.

>  Will the application be happy if the data is magically modified?

if we do it right, yes; it won't notice.

>  How to avoid problems?

do it right :)


	Lars Ellenberg



More information about the drbd-user mailing list