Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, 2004-06-17 at 00:03, Lars Marowsky-Bree wrote: > > Now, *without* further activity on the device, I powered up nfsa1. It > > started synching and was done in a few seconds. Then I powered down > > nfsa2, made nfsa1 primary again and ran fsck. It found lots of errors! > > Mostly duplicate blocks in inodes. > > The current CVS version is unstable and corrupts the data; this is > reproducible on our test setups too, and on Philipp's also. It's a work > in progress right now - hopefully fixed RSN ;-) With CVS version of Friday morning, I cannot reproduce corruption with significant activity, and several node switchovers. Good! I noticed a couple things though: 1. After manual "drbdadm disconnect", *first* subsequent attempt to "connect" produces stack trace and device stays "StandAlone". Second attempt to "connect" succeeds and apparently no other harm is done. Maybe it's triggered by the fact that both notes where made primary (to check data integrity), and then the "wrong" one was manually made secondary before trying to connect them again. [sorry for possibly clipped lines / lost characters] Jun 19 18:33:23 nfsa2 kernel: drbd0: Connection established. Jun 19 18:33:23 nfsa2 kernel: drbd0: Current Primary shall become sync TARGET! Aborting to prevent data corruption. Jun 19 18:33:23 nfsa2 kernel: drbd0: error receiving ReportParams, l: 68! Jun 19 18:33:23 nfsa2 kernel: drbd0: worker terminated Jun 19 18:33:23 nfsa2 kernel: drbd0: asender terminated Jun 19 18:33:23 nfsa2 kernel: drbd0: Connection lost. Jun 19 18:33:23 nfsa2 kernel: drbd0: receiver terminated Jun 19 18:33:37 nfsa2 kernel: drbd0: drivers/block/drbd/drbd_receiver.c:1314: bitmap already locked by drivers/block/drbd/drbd_receiver.c:1314 Jun 19 18:33:37 nfsa2 kernel: Call Trace: Jun 19 18:33:37 nfsa2 kernel: [<c023eca8>] __drbd_bm_lock+0xf4/0x13c Jun 19 18:33:37 nfsa2 kernel: [<c024a965>] receive_param+0x188/0x5f8 Jun 19 18:33:37 nfsa2 kernel: [<c0248937>] drbd_recv_header+0x2b/0xe6 Jun 19 18:33:37 nfsa2 kernel: [<c024b523>] drbdd+0x45/0xca Jun 19 18:33:37 nfsa2 kernel: [<c024bc54>] drbdd_init+0x69/0x163 Jun 19 18:33:37 nfsa2 kernel: [<c0251120>] drbd_thread_setup+0x78/0xde Jun 19 18:33:37 nfsa2 kernel: [<c02510a8>] drbd_thread_setup+0x0/0xde Jun 19 18:33:37 nfsa2 kernel: [<c0102271>] kernel_thread_helper+0x5/0xb Jun 19 18:33:37 nfsa2 kernel: Jun 19 18:33:37 nfsa2 kernel: drbd0: Connection established. Jun 19 18:33:37 nfsa2 kernel: drbd0: Primary/Unknown --> Primary/Secondary Jun 19 18:33:37 nfsa2 kernel: drbd0: Resync started as SyncSource (need to sync 593460 KB [148365 bits set]). Second thing: when I did "invalidate all" there was a lot of messages line these (some characters where lost on serial console): Jun 18:37:01 nfsa1ernel: drbd0: dd_bm_clear_bit:leared a bitnr=30113 while Concted Jun 19 187:01 nfsa1 kern: drbd0: drbd_bclear_bit: clead a bitnr=23301 while Connecte and for several seconds /proc/drbd shows "99%" synched. After that, sessages stop, "Resync started" message appears and /proc/drbd start showing reasonable figures. Eugene