[DRBD-user] Opteron/Xeon - Kernel 2.4/2.6 setup

Eugene Crosser crosser at rol.ru
Sat Jun 19 21:11:51 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, 2004-06-17 at 00:03, Lars Marowsky-Bree wrote:

> > Now, *without* further activity on the device, I powered up nfsa1.  It
> > started synching and was done in a few seconds.  Then I powered down
> > nfsa2, made nfsa1 primary again and ran fsck.  It found lots of errors! 
> > Mostly duplicate blocks in inodes.
> 
> The current CVS version is unstable and corrupts the data; this is
> reproducible on our test setups too, and on Philipp's also. It's a work
> in progress right now - hopefully fixed RSN ;-)

With CVS version of Friday morning, I cannot reproduce corruption with
significant activity, and several node switchovers.  Good!

I noticed a couple things though:

1. After manual "drbdadm disconnect", *first* subsequent attempt to
"connect" produces stack trace and device stays "StandAlone".  Second
attempt to "connect" succeeds and apparently no other harm is done. 
Maybe it's triggered by the fact that both notes where made primary (to
check data integrity), and then the "wrong" one was manually made
secondary before trying to connect them again.

[sorry for possibly clipped lines / lost characters]

Jun 19 18:33:23 nfsa2 kernel: drbd0: Connection established.
Jun 19 18:33:23 nfsa2 kernel: drbd0: Current Primary shall become sync TARGET! Aborting to prevent data corruption.
Jun 19 18:33:23 nfsa2 kernel: drbd0: error receiving ReportParams, l: 68!
Jun 19 18:33:23 nfsa2 kernel: drbd0: worker terminated
Jun 19 18:33:23 nfsa2 kernel: drbd0: asender terminated
Jun 19 18:33:23 nfsa2 kernel: drbd0: Connection lost.
Jun 19 18:33:23 nfsa2 kernel: drbd0: receiver terminated
Jun 19 18:33:37 nfsa2 kernel: drbd0: drivers/block/drbd/drbd_receiver.c:1314: bitmap already locked by drivers/block/drbd/drbd_receiver.c:1314
Jun 19 18:33:37 nfsa2 kernel: Call Trace:
Jun 19 18:33:37 nfsa2 kernel:  [<c023eca8>] __drbd_bm_lock+0xf4/0x13c
Jun 19 18:33:37 nfsa2 kernel:  [<c024a965>] receive_param+0x188/0x5f8
Jun 19 18:33:37 nfsa2 kernel:  [<c0248937>] drbd_recv_header+0x2b/0xe6
Jun 19 18:33:37 nfsa2 kernel:  [<c024b523>] drbdd+0x45/0xca
Jun 19 18:33:37 nfsa2 kernel:  [<c024bc54>] drbdd_init+0x69/0x163
Jun 19 18:33:37 nfsa2 kernel:  [<c0251120>] drbd_thread_setup+0x78/0xde
Jun 19 18:33:37 nfsa2 kernel:  [<c02510a8>] drbd_thread_setup+0x0/0xde
Jun 19 18:33:37 nfsa2 kernel:  [<c0102271>] kernel_thread_helper+0x5/0xb
Jun 19 18:33:37 nfsa2 kernel:
Jun 19 18:33:37 nfsa2 kernel: drbd0: Connection established.
Jun 19 18:33:37 nfsa2 kernel: drbd0: Primary/Unknown --> Primary/Secondary
Jun 19 18:33:37 nfsa2 kernel: drbd0: Resync started as SyncSource (need to sync 593460 KB [148365 bits set]).

Second thing: when I did "invalidate all" there was a lot of messages
line these (some characters where lost on serial console):

Jun  18:37:01 nfsa1ernel: drbd0: dd_bm_clear_bit:leared a bitnr=30113 while Concted
Jun 19 187:01 nfsa1 kern: drbd0: drbd_bclear_bit: clead a bitnr=23301 while Connecte

and for several seconds /proc/drbd shows "99%" synched.  After that,
sessages stop, "Resync started" message appears and /proc/drbd start
showing reasonable figures.

Eugene




More information about the drbd-user mailing list