[DRBD-user] Opteron/Xeon - Kernel 2.4/2.6 setup
crosser at rol.ru
Wed Jun 16 21:28:51 CEST 2004
On Tue, 2004-06-15 at 13:08, Lars Ellenberg wrote:
> > so I implemented it from scratch now, should find its way into cvs today.
> > since it is new (written yesterday in one go), it may of course have new
> > bugs, but it is MUCH more readable and thus easier to debug...
> can you do me a favor please: my local test setup is not able reproduce
> the problem, so I cannot verify if this fixes it...
> use current cvs (needs to have drbd_bitmap.c), and the patch below
> (unless philipp already checked it in), and see whether your test setup
> can verify that now we no longer have data corruption.
I could not run tests in *exactly* same environment as before but I did
my best. I installed two similar machines (Dell 1750), the only
difference in hardware was lack of Megaraid. Instead, I used plain 18Gb
U320 drive. I also could not impose real world load; instead, I started
25 processes that where creating multiple files in multiple directories
over NFS. This was running several hours, producing about 3 Mb/sec disk
IO according by iostat.
Originally, nfsa1 was primary. When the scripts finished I turned off
power on the nfsa1, made nfsa2 primary and ran fsck. It was clean. So
far, so good.
Now, *without* further activity on the device, I powered up nfsa1. It
started synching and was done in a few seconds. Then I powered down
nfsa2, made nfsa1 primary again and ran fsck. It found lots of errors!
Mostly duplicate blocks in inodes.
Now, I am sorry, but I do not have syslog here (yet). I include
cut-and-paste from the serial console of the former primary, the lines
are clipped. This is when it was "synching back":
Jun 16 19:00:58 nfsa1 kernel: drbd0: resync bitmap: bits=4413041 words=137908
Jun 16 19:00:58 nfsa1 kernel: drbd0: size = 17652164 KB
Jun 16 19:00:58 nfsa1 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map.
Jun 16 19:00:58 nfsa1 kernel: drbd0: Found 6 transactions (324 active extents) .
Jun 16 19:00:58 nfsa1 kernel: drbd0: ASSERT( 0 ) in drivers/block/drbd/drbd_bit2
Jun 16 19:00:58 nfsa1 last message repeated 176 times
Jun 16 19:00:58 nfsa1 kernel: drbd0: Marked additional 163840 KB as out-of-sync.[drbd0]
Waiting until resources are connected (or timeouted)
Jun 16 19:00:58 nfsa1 kerne.Jun 16 19:00:58 nfsa1 kernel: drbd0: now Secondary/Primary
Jun 16 19:00:58 nfsa1 kernel: drbd0: Resync started as SyncTarget (need to sync.
Jun 16 19:01:33 nfsa1 kernel: drbd0: Resync done (total 35 sec; 37454 K/sec)
More information about the drbd-user