Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2005-09-14 18:49:03 +0200 \ Diego Liziero: > Hello, > today we tried to update our drbd 0.6.x system to 0.7.13 > using a free disk partition as meta-disk. > > We followed all the update instructions and we got the first > 5 drbd partitions in sync with the new 0.7.13 version. > > While the 6th and last drdb partition was syncing, we first noticed a > slowdown. > > The bitrate went down from 480Mbit/sec to about 60Mbit/sec. > > The link between the 2 nodes of the cluster is a dedicated gigabit > ethernet link used only by drbd, we noticed and measured > this slowdown using iptraf. note that to the best of my knowledge iptraf rate measurement is buggy. we recently tried to measure performance of iSCSI initiators/targets, and nearly went up the wall when we recognized after hours of fruitless tuning that the measurement was broken.... > The last partition is the bigger one (250G), and after 10% of the > resync process, the primary cluster hanged. The console was black, > the keyboard not responding, we had to press the reset button. this however is interessting. does this device sync successfully - if it is the only configured device? - if you configure fewer devices? - if you reorder it, i.e. it comes not last first? - if you reorder sync groups, i.e. it is not synced last? > We tried this process various times, and with different versions > of the 2.4.21smp kernel and all with a new (recompiled > each time) 0.7.13 drdb module. > > In all cases we got a system hang during the resync, sometimes > with a slowdown of the sync rate some minutes before the hang. > > In one case we were able to see an Oops message on the console, > but unfortunately just the last lines were visible > (I remember something about tasker, irq and smp) > and shift-pageup was not working. try to grab that with a serial console. ==> make sure you have NMI watchdog enabled in your kernel <== to better detect deadlocks. > Our system is a cluster with 2 servers each one with 4 Xeon > processors and 7 Gb of RAM. > > The same kernel version works fine with drbd 0.6.12 -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.