Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, 2004-06-15 at 13:08, Lars Ellenberg wrote: > > so I implemented it from scratch now, should find its way into cvs today. > > since it is new (written yesterday in one go), it may of course have new > > bugs, but it is MUCH more readable and thus easier to debug... > > can you do me a favor please: my local test setup is not able reproduce > the problem, so I cannot verify if this fixes it... > > use current cvs (needs to have drbd_bitmap.c), and the patch below > (unless philipp already checked it in), and see whether your test setup > can verify that now we no longer have data corruption. I could not run tests in *exactly* same environment as before but I did my best. I installed two similar machines (Dell 1750), the only difference in hardware was lack of Megaraid. Instead, I used plain 18Gb U320 drive. I also could not impose real world load; instead, I started 25 processes that where creating multiple files in multiple directories over NFS. This was running several hours, producing about 3 Mb/sec disk IO according by iostat. Originally, nfsa1 was primary. When the scripts finished I turned off power on the nfsa1, made nfsa2 primary and ran fsck. It was clean. So far, so good. Now, *without* further activity on the device, I powered up nfsa1. It started synching and was done in a few seconds. Then I powered down nfsa2, made nfsa1 primary again and ran fsck. It found lots of errors! Mostly duplicate blocks in inodes. Now, I am sorry, but I do not have syslog here (yet). I include cut-and-paste from the serial console of the former primary, the lines are clipped. This is when it was "synching back": Jun 16 19:00:58 nfsa1 kernel: drbd0: resync bitmap: bits=4413041 words=137908 Jun 16 19:00:58 nfsa1 kernel: drbd0: size = 17652164 KB Jun 16 19:00:58 nfsa1 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map. Jun 16 19:00:58 nfsa1 kernel: drbd0: Found 6 transactions (324 active extents) . Jun 16 19:00:58 nfsa1 kernel: drbd0: ASSERT( 0 ) in drivers/block/drbd/drbd_bit2 Jun 16 19:00:58 nfsa1 last message repeated 176 times Jun 16 19:00:58 nfsa1 kernel: drbd0: Marked additional 163840 KB as out-of-sync.[drbd0] Waiting until resources are connected (or timeouted) Jun 16 19:00:58 nfsa1 kerne.Jun 16 19:00:58 nfsa1 kernel: drbd0: now Secondary/Primary Jun 16 19:00:58 nfsa1 kernel: drbd0: Resync started as SyncTarget (need to sync. Jun 16 19:01:33 nfsa1 kernel: drbd0: Resync done (total 35 sec; 37454 K/sec) Eugene