[DRBD-user] Opteron/Xeon - Kernel 2.4/2.6 setup

Eugene Crosser crosser at rol.ru
Wed Jun 16 21:28:51 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, 2004-06-15 at 13:08, Lars Ellenberg wrote:

> > so I implemented it from scratch now, should find its way into cvs today.
> > since it is new (written yesterday in one go), it may of course have new
> > bugs, but it is MUCH more readable and thus easier to debug...
> 
> can you do me a favor please: my local test setup is not able reproduce
> the problem, so I cannot verify if this fixes it...
> 
> use current cvs (needs to have drbd_bitmap.c), and the patch below
> (unless philipp already checked it in), and see whether your test setup
> can verify that now we no longer have data corruption.

I could not run tests in *exactly* same environment as before but I did
my best.  I installed two similar machines (Dell 1750), the only
difference in hardware was lack of Megaraid.  Instead, I used plain 18Gb
U320 drive.  I also could not impose real world load; instead, I started
25 processes that where creating multiple files in multiple directories
over NFS.  This was running several hours, producing about 3 Mb/sec disk
IO according by iostat.

Originally, nfsa1 was primary.  When the scripts finished I turned off
power on the nfsa1, made nfsa2 primary and ran fsck.  It was clean.  So
far, so good.

Now, *without* further activity on the device, I powered up nfsa1.  It
started synching and was done in a few seconds.  Then I powered down
nfsa2, made nfsa1 primary again and ran fsck.  It found lots of errors! 
Mostly duplicate blocks in inodes.

Now, I am sorry, but I do not have syslog here (yet).  I include
cut-and-paste from the serial console of the former primary, the lines
are clipped.  This is when it was "synching back":

Jun 16 19:00:58 nfsa1 kernel: drbd0: resync bitmap: bits=4413041 words=137908
Jun 16 19:00:58 nfsa1 kernel: drbd0: size = 17652164 KB
Jun 16 19:00:58 nfsa1 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map.
Jun 16 19:00:58 nfsa1 kernel: drbd0: Found 6 transactions (324 active extents) .
Jun 16 19:00:58 nfsa1 kernel: drbd0: ASSERT( 0 ) in drivers/block/drbd/drbd_bit2
Jun 16 19:00:58 nfsa1 last message repeated 176 times
Jun 16 19:00:58 nfsa1 kernel: drbd0: Marked additional 163840 KB as out-of-sync.[drbd0]
Waiting until resources are connected (or timeouted)
Jun 16 19:00:58 nfsa1 kerne.Jun 16 19:00:58 nfsa1 kernel: drbd0: now Secondary/Primary
Jun 16 19:00:58 nfsa1 kernel: drbd0: Resync started as SyncTarget (need to sync.
Jun 16 19:01:33 nfsa1 kernel: drbd0: Resync done (total 35 sec; 37454 K/sec)

Eugene




More information about the drbd-user mailing list