Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Jan 13, 2009 at 4:52 PM, Lars Ellenberg <lars.ellenberg at linbit.com>wrote: > On Tue, Jan 13, 2009 at 01:06:20PM -0500, Gennadiy Nerubayev wrote: > > On Fri, Dec 19, 2008 at 1:39 PM, Lars Ellenberg > > <lars.ellenberg at linbit.com>wrote: > > > > > On Fri, Dec 19, 2008 at 09:24:32AM -0500, Gennadiy Nerubayev wrote: > > > > On Thu, Dec 18, 2008 at 1:50 PM, Lars Ellenberg < > > > lars.ellenberg at linbit.com> > > > > wrote: > > Small update: > > > > 500MB/s makes sense if it's a single burst. What I'm finding is that > during > > a long sync, the speed fluctuates wildly, even though neither the network > > link nor the storage exhibit such fluctuations on their own. I made a > graph > > showing this effect during a sync lasting ~40 minutes. A script ran cat > > /proc/drbd ran every second, taking the first speed value. The average > after > > the first minute or two stabilized at ~385MB/s: > > forget the "first speed value" in /proc/drbd > the way it is calculated now, it takes > sample of yet-to-be-synced bits every ten seconds. > > so (resync_left, jiffies_at_sample_time) > > then, when you read /proc/drbd, it calculates the "current" sync speed > straight forward. > but mind you, if that calculation happens only a jiffy after that sample > time, you probably get a sync rate of either zero (in case during that > jiffy resync_left has not changed), or a HUGE number (because > there may have been a resync_left update in exactly that jiffy). > > we used to have "rolling averages" there, somewhen years ago, > but they got lost later for no particular reason. > it is a very imprecise rough estimate, > don't mistake it for a measurement. > > if you want to actually graph something drbd related, > sample the numbers for dw, dr, ns, nr > (counters, unit kB, disk write/read, net send/receive) > al, bm > (counters, activity log and bitmap meta data write counts in requests) > oos (gauge: number of out-of-sync kB) > and maybe ap, lo, pe, ua > (gauges, not that interessting unless finetuning by experts). > > > There's a definite pattern > that pattern is probably a sampling error of a badly behaved > (as explained above) gauge, and absolutly expected.;) > > also, please note that whenever a new piece is cleared completely, > the corresponding part of the bitmap is written, > possibly causing seek and a short pause during sync... > > do that "experiment" again, but sample oos, > and plot ( oos[t] - oos[t-3] ) / 3 ... Doh! You're right; by doing simple graphing of how oos decreases every second, I can see that it's really uniform, varying around ~375-400MB/s with no spikes whatsoever. Going to dig at this some more. Thanks, -Gennadiy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090113/b81a1270/attachment.htm>