<html><head><meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type"></head><body><div><div style="font-family: Calibri,sans-serif; font-size: 11pt;"><br></div></div><hr><span style="font-family: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">From: </span><span style="font-family: Tahoma,sans-serif; font-size: 10pt;">Lars Ellenberg</span><br><span style="font-family: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">Sent: </span><span style="font-family: Tahoma,sans-serif; font-size: 10pt;">5/30/2014 6:43 AM</span><br><span style="font-family: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">To: </span><span style="font-family: Tahoma,sans-serif; font-size: 10pt;">drbd-user@lists.linbit.com</span><br><span style="font-family: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">Subject: </span><span style="font-family: Tahoma,sans-serif; font-size: 10pt;">Re: [DRBD-user] Adjusting al-extents on-the-fly</span><br><br>On Wed, May 28, 2014 at 01:23:55PM +1000, Stuart Longland wrote:<br>> Hi Lars,<br>> On 27/05/14 20:31, Lars Ellenberg wrote:<br>> >> The system logs PLC-generated process data every 5 seconds, and at two<br>> >> times of the day, at midnight and midday, it misses a sample with the<br>> >> logging taking 6 seconds. There's no obvious CPU spike at this time, so<br>> >> my hunch is I/O, and so I'm looking at ways to try and improve this.<br>> > <br>> > Funny how if "something" happens,<br>> > and there is DRBD anywhere near it,<br>> > it is "obviously" DRBD's fault, naturally.<br>> <br>> No, it's not "obviously" DRBD's fault. It is a factor, as is the CPU.<br>> Rather, it's the network and/or disk, of which DRBD is reliant both of<br>> these, and (to a lesser extent) CPU time.<br>> <br>> I'm faced with a number of symptoms, and so it is right I consider *all*<br>> factors, including DRBD and the I/O subsystems that underpin it.<br><br>Okok ...<br><br>> >> iotop didn't show any huge spikes that I'd imagine the disks would have<br>> >> trouble with. Then again, since it's effectively polling, I could have<br>> >> "blinked" and missed it.<br>> > <br>> > If your data gathering and logging thingy misses a sample<br>> > because of the logging to disk (assuming for now that this is in fact<br>> > what happens), you are still doing it wrong.<br>> > <br>> > Make the data sampling asynchronous wrt. flushing data to disk.<br>> <br>> Sadly how it does the logging is outside my control. The SCADA package<br>> is one called MacroView, and is made available for a number of platforms<br>> under a proprietary license. I do not have the source code, however it<br>> has been used successfully on quite a large number of systems.<br>> <br>> The product has been around since the late 80's on numerous Unix<br>> variants. Its methods may not be "optimal", but they seem to work well<br>> enough in a large number of cases.<br>> <br>> The MacroView Historian basically reads its data from shared memory<br>> segments exported by PLC drivers, computes whatever summary data is<br>> needed then writes this out to disk. So the process is both I/O and<br>> possibly CPU intensive.<br>> <br>> I can't do much about the CPU other than fiddling with `nice` without a<br>> hardware upgrade (which may yet happen; time will tell).<br>> <br>> I don't see the load-average sky rocketing which is why I suspected I/O:<br>> either disk writes that are being bottle-necked by the gigabit network<br>> link, or perhaps the disk controller.<br>> <br>> The DRBD installation there was basically configured and gotten to a<br>> working state, there was a little monkey-see-monkey-do learning in the<br>> beginning, so it's possible that performance can be enhanced with a<br>> little tweaking.<br>> <br>> The literature suggests a number of parameters are dependent on the<br>> hardware used, and this, is what I'm looking into.<br>> <br>> This is one possibility I am investigating: being mindful that this is a<br>> live production cluster that I'm working on. Thus I have to be careful<br>> what I adjust, and how I adjust it.<br><br>Sure.<br><br>Well, IO subsystems may have occasional latency spikes.<br>DRBD may trigger, be responsible for, or even cause<br>additional latency spikes.<br><br>IF your scada would "catch one sample then synchronously log it",<br>particular high latency spikes might cause it to miss the next samle.<br><br>I find that highly unlikely.<br>Both that sampling and logging would be so tightly coupled,<br>and that the latency spike would take that long (if nothing else is<br>going on, and the system is not completely overloaded;<br>with really loaded systems, arbitrary queue length and buffer bloat,<br>I can easily make the latency spike for minutes).<br><br>As this is "pro" stuff, I think it is safe to assume<br>that gathering data, and logging that data, is not so tightly coupled.<br>Which leads me to believe that it missing a sample<br>has nothing to do with persisting the previous sample(s) to disk.<br><br>Especially if it happens so regularly twice a day noon and midnight.<br>What is so "special" about those times?<br>flushing logs? log rotation?<br><br>You wrote "with the logging taking 6 seconds".<br>What exactly does that mean?<br>"the logging"?<br>"taking 6 seconds"?<br>what exactly takes six seconds?<br>how do you know?<br><br>Are some clocks slightly off<br>and get adjusted twice a day?<br><br>> >> DR:BD is configured with a disk partition on a RAID array as its backing<br>> > <br>> > Wrong end of the system to tune in this case, imo.<br>> <br>> Well, hardware configuration and BIOS settings are out of my reach as<br>> I'm in Brisbane and the servers in question are somewhere in Central<br>> Queensland some 1000km away.<br>> <br>> > This (adjusting of the "al-extents" only) is a rather boring command<br>> > actually. It may stall IO on a very busy backend a bit,<br>> > changes some internal "caching hash table size" (sort of),<br>> > and continues.<br>> <br>> Does the change of the internal 'caching hash table size' do anything<br>> destructive to the DR:BD volume?<br><br>No. Really.<br>Why would we do something destructive to your data<br>because you change some syncronisation parameter.<br>And I even just wrote it was "boring, at most briefly stalls then<br>continues IO". I did not write<br>it-will-reformat-and-panic-the-box-be-careful-dont-use.<br><br>But unless your typical working set size is much larger than what<br>the current setting covered, this is unlikely to help.<br>(257 al-extents correspond to ~ 1GByte working set)<br>If it is not about the size, but the change rate, of your working set,<br>you will need to upgrade to drbd 8.4.<br><br>> http://www.drbd.org/users-guide-8.3/re-drbdsetup.html mentions that<br>> --create-device "In case the specified DRBD device (minor number) does<br>> not exist yet, create it implicitly."<br>> <br>> Unfortunately to me "device" is ambiguous, is this the block device file<br>> in /dev, or the actual logical DR:BD device (i.e. the partition).<br><br>So what. "In case .* does not exist yet".<br>Well, it does exist.<br>So that's a no-op, right?<br><br>Anyways. That flag is passed from drbdadm to drbdsetup *always*<br>(in your drbd version).<br>And it does no harm. Not even to your data.<br>It's an internal convenience flag.<br><br>> I don't want to create a new device, I just want to re-use the existing<br>> one that's there and keep its data.<br>> <br>> > As your server seems to be rather not-so-busy, IO wise,<br>> > I don't think this will even be noticable.<br>> <br>> Are there other parameters that I should be looking at?<br><br>If this is about DRBD tuning,<br>well, yes, there are many things to consider.<br>If there were just one optimal set of values,<br>those would be hardcoded, and not tunables.<br><br>> Sync-rates perhaps?<br><br>Did you have resync going on during your "interesting" times?<br>If not, why bother, at this time, for this issue.<br>If yes, why would you always resync at noon and midnight?<br><br>> Once again, the literature suggests this should be higher if the writes<br>> are small and "scattered" in nature, which given we're logging data from<br>> numerous sources, I'd expect to be the case.<br><br>Sync rate is not relevant at all here.<br>Those parameters control the background resynchronization<br>after connection loss and re-establishment.<br>As I understand, your DRBD is healthy, connected,<br>and happily replicating. No resync.<br><br>> Thus following the documentation's recommendations (and not being an<br>> expert myself) I figured I'd try carefully adjusting that figure to<br>> something more appropriate.<br><br>Sure, careful is good.<br>Test system is even better ;-)<br><br>If you really want to improve on random write latency with DRBD,<br>you need to upgrade to 8.4. (8.4.5 will be released within days).<br><br>I guess that upgrade is too scary for such a system?<br><br>Also, you could use auditctl to find out in detail what is happenening<br>on your system. You likely want to play with that on a test system first<br>as well, until you get the event filters right,<br>or you could end up spamming your production systems logs.<br><br>-- <br>: Lars Ellenberg<br>: LINBIT | Your Way to High Availability<br>: DRBD/HA support and consulting http://www.linbit.com<br><br>DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.<br>__<br>please don't Cc me, but send to list -- I'm subscribed<br>_______________________________________________<br>drbd-user mailing list<br>drbd-user@lists.linbit.com<br>http://lists.linbit.com/mailman/listinfo/drbd-user<br></body></html>