Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Jan 13, 2009 at 01:06:20PM -0500, Gennadiy Nerubayev wrote: > On Fri, Dec 19, 2008 at 1:39 PM, Lars Ellenberg > <lars.ellenberg at linbit.com>wrote: > > > On Fri, Dec 19, 2008 at 09:24:32AM -0500, Gennadiy Nerubayev wrote: > > > On Thu, Dec 18, 2008 at 1:50 PM, Lars Ellenberg < > > lars.ellenberg at linbit.com> > > > wrote: > > > > uh. oh. > > I have to admit that this was probably not really realistical. > > sort of only writing 500MB (odirect, "synchronously"), as that > > was what fit into the controller cache... > > and it finished subsecond. that's where the number comes from > > ;) > > don't have a real storage backend in the lab capable of sustained > > writes in that performance range. (yet.) > > > > but, I think nothing special, actually, > > it was jumbo frames, disabling flow control, and huge max-buffers > > and the like, that did the trick, mostly, as well as allowing more than > > one core (as one single cpu was maxed out sometimes). > > > Small update: > > 500MB/s makes sense if it's a single burst. What I'm finding is that during > a long sync, the speed fluctuates wildly, even though neither the network > link nor the storage exhibit such fluctuations on their own. I made a graph > showing this effect during a sync lasting ~40 minutes. A script ran cat > /proc/drbd ran every second, taking the first speed value. The average after > the first minute or two stabilized at ~385MB/s: forget the "first speed value" in /proc/drbd the way it is calculated now, it takes sample of yet-to-be-synced bits every ten seconds. so (resync_left, jiffies_at_sample_time) then, when you read /proc/drbd, it calculates the "current" sync speed straight forward. but mind you, if that calculation happens only a jiffy after that sample time, you probably get a sync rate of either zero (in case during that jiffy resync_left has not changed), or a HUGE number (because there may have been a resync_left update in exactly that jiffy). we used to have "rolling averages" there, somewhen years ago, but they got lost later for no particular reason. it is a very imprecise rough estimate, don't mistake it for a measurement. if you want to actually graph something drbd related, sample the numbers for dw, dr, ns, nr (counters, unit kB, disk write/read, net send/receive) al, bm (counters, activity log and bitmap meta data write counts in requests) oos (gauge: number of out-of-sync kB) and maybe ap, lo, pe, ua (gauges, not that interessting unless finetuning by experts). > There's a definite pattern that pattern is probably a sampling error of a badly behaved (as explained above) gauge, and absolutly expected.;) also, please note that whenever a new piece is cleared completely, the corresponding part of the bitmap is written, possibly causing seek and a short pause during sync... do that "experiment" again, but sample oos, and plot ( oos[t] - oos[t-3] ) / 3 ... and if there is still much fluctuation, we'll see what explanation I find for it. -- : Lars Ellenberg http://www.linbit.com : : DRBD/HA support and consulting sales at linbit.com : : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : __ please use the "List-Reply" function of your email client.