[DRBD-user] error (device drbd0): ext3_free_blocks: bit already cleared

Mon May 17 16:06:50 CEST 2004

/ 2004-05-17 17:15:02 +0400
\ Eugene Crosser:
> This is today's CVS drbd 0.7_pre7

I did a few checkins today again, please cvs update :)

one of those was a recently introduced thinko which lead to
not syncing parts of the device that would have needed a sync!

> It may be related to DRBD or Jan's recent changes in ext3.
> After a couple switchovers of primary and secondary, right when nfsa1
> was booting, primary nfsa2 got filesystem panic (which did not actually
> halted the system but this is another story).
> 

I remember a this story about a panic and an oops canceling each other
on a SMP box with reiserfs ... :-/

> May 17 16:49:18 nfsa2.mail.back kernel: tg3: eth1: Link is up at 1000 Mbps, full duplex.
> May 17 16:49:18 nfsa2.mail.back kernel: tg3: eth1: Flow control is on for TX and on for RX.
> May 17 16:49:33 nfsa2.mail.back kernel: EXT3-fs error (device drbd0): ext3_free_blocks: bit already cleared for block 43658531
> May 17 16:49:33 nfsa2.mail.back kernel: Kernel panic: EXT3-fs (device drbd0): panic forced after error
> May 17 16:49:33 nfsa2.mail.back kernel: 
> May 17 16:49:53 nfsa1.mail.back kernel: drbd0: size = 214165504 KB
> May 17 16:49:54 nfsa1.mail.back kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map.
> May 17 16:49:54 nfsa1.mail.back kernel: drbd0: Found 6 transactions (324 active extents) in activity log.
> May 17 16:49:54 nfsa1.mail.back kernel: drbd0: Connection established.
> May 17 16:49:54 nfsa1.mail.back kernel: drbd0: Resync started as target (need to sync 1817204 KB).
> May 17 16:49:55 nfsa1.mail.back kernel: process `snmpd' is using obsolete setsockopt SO_BSDCOMPAT
> May 17 16:50:42 nfsa1.mail.back kernel: drbd0: Resync done (total 48 sec; 37858 K/sec)
> 
> After making nfsa1 primary (sync completed successfully after filesystem
> panic, and drbd status was "Consistent"), fsck displayed a lot of
> 

now, if the filesystem was corrupt *before* the sync, the corruption
won't vanish by the sync...

> Inode 2240702 is in use, but has dtime set.  Fix<y>? yes
> Inode 2240702 has imagic flag set.  Clear<y>? yes
> Inode 2240675 has compression flag set on filesystem without compression support.  Clear<y>? yes
> Inode 2240675 has illegal block(s).  Clear<y>? yes
> Inode 2240688, i_size is 4920270980849887604, should be 0.  Fix<y>? yes
> Inode 2240688, i_blocks is 1248814951, should be 0.  Fix<y>? yes
> Inode 2240696 has INDEX_FL flag set but is not a directory.
> Clear HTree index<y>? yes
> 
> Eugene

I am happy to announce that in my (UML. no real hw at hand right now.
will do soon on linbit and suse test clusters.) test setup,

several concurrent
cp -au /bin  /boot  /etc  /home  /lib  /opt  /root  /sbin  /tmp  /usr  /var . &
where . is an XFS on top of DRBD,
survived all network and node failures my test harness (which seems to be working at last)
was throwing at it, several failovers, XFS journal replays, ...
and a recursiv proved correctness (apart from some /etc/whatever files, which
are different on the two UMLs, so this is OK).

So we are getting closer!

	Lars Ellenberg