<html><head><meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type"></head><body><div><div style="font-family: Calibri,sans-serif; font-size: 11pt;"><br></div></div><hr><span style="font-family: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">From: </span><span style="font-family: Tahoma,sans-serif; font-size: 10pt;">Lars Ellenberg</span><br><span style="font-family: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">Sent: </span><span style="font-family: Tahoma,sans-serif; font-size: 10pt;">5/30/2014 6:43 AM</span><br><span style="font-family: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">To: </span><span style="font-family: Tahoma,sans-serif; font-size: 10pt;">drbd-user@lists.linbit.com</span><br><span style="font-family: Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">Subject: </span><span style="font-family: Tahoma,sans-serif; font-size: 10pt;">Re: [DRBD-user] Adjusting al-extents on-the-fly</span><br><br>On Wed, May 28, 2014 at 01:23:55PM +1000, Stuart Longland wrote:<br>&gt; Hi Lars,<br>&gt; On 27/05/14 20:31, Lars Ellenberg wrote:<br>&gt; &gt;&gt; The system logs PLC-generated process data every 5 seconds, and at two<br>&gt; &gt;&gt; times of the day, at midnight and midday, it misses a sample with the<br>&gt; &gt;&gt; logging taking 6 seconds.&nbsp; There's no obvious CPU spike at this time, so<br>&gt; &gt;&gt; my hunch is I/O, and so I'm looking at ways to try and improve this.<br>&gt; &gt; <br>&gt; &gt; Funny how if "something" happens,<br>&gt; &gt; and there is DRBD anywhere near it,<br>&gt; &gt; it is "obviously" DRBD's fault, naturally.<br>&gt; <br>&gt; No, it's not "obviously" DRBD's fault.&nbsp; It is a factor, as is the CPU.<br>&gt; Rather, it's the network and/or disk, of which DRBD is reliant both of<br>&gt; these, and (to a lesser extent) CPU time.<br>&gt; <br>&gt; I'm faced with a number of symptoms, and so it is right I consider *all*<br>&gt; factors, including DRBD and the I/O subsystems that underpin it.<br><br>Okok ...<br><br>&gt; &gt;&gt; iotop didn't show any huge spikes that I'd imagine the disks would have<br>&gt; &gt;&gt; trouble with.&nbsp; Then again, since it's effectively polling, I could have<br>&gt; &gt;&gt; "blinked" and missed it.<br>&gt; &gt; <br>&gt; &gt; If your data gathering and logging thingy misses a sample<br>&gt; &gt; because of the logging to disk (assuming for now that this is in fact<br>&gt; &gt; what happens), you are still doing it wrong.<br>&gt; &gt; <br>&gt; &gt; Make the data sampling asynchronous wrt. flushing data to disk.<br>&gt; <br>&gt; Sadly how it does the logging is outside my control.&nbsp; The SCADA package<br>&gt; is one called MacroView, and is made available for a number of platforms<br>&gt; under a proprietary license.&nbsp; I do not have the source code, however it<br>&gt; has been used successfully on quite a large number of systems.<br>&gt; <br>&gt; The product has been around since the late 80's on numerous Unix<br>&gt; variants.&nbsp; Its methods may not be "optimal", but they seem to work well<br>&gt; enough in a large number of cases.<br>&gt; <br>&gt; The MacroView Historian basically reads its data from shared memory<br>&gt; segments exported by PLC drivers, computes whatever summary data is<br>&gt; needed then writes this out to disk.&nbsp; So the process is both I/O and<br>&gt; possibly CPU intensive.<br>&gt; <br>&gt; I can't do much about the CPU other than fiddling with `nice` without a<br>&gt; hardware upgrade (which may yet happen; time will tell).<br>&gt; <br>&gt; I don't see the load-average sky rocketing which is why I suspected I/O:<br>&gt; either disk writes that are being bottle-necked by the gigabit network<br>&gt; link, or perhaps the disk controller.<br>&gt; <br>&gt; The DRBD installation there was basically configured and gotten to a<br>&gt; working state, there was a little monkey-see-monkey-do learning in the<br>&gt; beginning, so it's possible that performance can be enhanced with a<br>&gt; little tweaking.<br>&gt; <br>&gt; The literature suggests a number of parameters are dependent on the<br>&gt; hardware used, and this, is what I'm looking into.<br>&gt; <br>&gt; This is one possibility I am investigating: being mindful that this is a<br>&gt; live production cluster that I'm working on.&nbsp; Thus I have to be careful<br>&gt; what I adjust, and how I adjust it.<br><br>Sure.<br><br>Well, IO subsystems may have occasional latency spikes.<br>DRBD may trigger, be responsible for, or even cause<br>additional latency spikes.<br><br>IF your scada would "catch one sample then synchronously log it",<br>particular high latency spikes might cause it to miss the next samle.<br><br>I find that highly unlikely.<br>Both that sampling and logging would be so tightly coupled,<br>and that the latency spike would take that long (if nothing else is<br>going on, and the system is not completely overloaded;<br>with really loaded systems, arbitrary queue length and buffer bloat,<br>I can easily make the latency spike for minutes).<br><br>As this is "pro" stuff, I think it is safe to assume<br>that gathering data, and logging that data, is not so tightly coupled.<br>Which leads me to believe that it missing a sample<br>has nothing to do with persisting the previous sample(s) to disk.<br><br>Especially if it happens so regularly twice a day noon and midnight.<br>What is so "special" about those times?<br>flushing logs?&nbsp; log rotation?<br><br>You wrote "with the logging taking 6 seconds".<br>What exactly does that mean?<br>"the logging"?<br>"taking 6 seconds"?<br>what exactly takes six seconds?<br>how do you know?<br><br>Are some clocks slightly off<br>and get adjusted twice a day?<br><br>&gt; &gt;&gt; DR:BD is configured with a disk partition on a RAID array as its backing<br>&gt; &gt; <br>&gt; &gt; Wrong end of the system to tune in this case, imo.<br>&gt; <br>&gt; Well, hardware configuration and BIOS settings are out of my reach as<br>&gt; I'm in Brisbane and the servers in question are somewhere in Central<br>&gt; Queensland some 1000km away.<br>&gt; <br>&gt; &gt; This (adjusting of the "al-extents" only) is a rather boring command<br>&gt; &gt; actually.&nbsp; It may stall IO on a very busy backend a bit,<br>&gt; &gt; changes some internal "caching hash table size" (sort of),<br>&gt; &gt; and continues.<br>&gt; <br>&gt; Does the change of the internal 'caching hash table size' do anything<br>&gt; destructive to the DR:BD volume?<br><br>No.&nbsp; Really.<br>Why would we do something destructive to your data<br>because you change some syncronisation parameter.<br>And I even just wrote it was "boring, at most briefly stalls then<br>continues IO".&nbsp; I did not write<br>it-will-reformat-and-panic-the-box-be-careful-dont-use.<br><br>But unless your typical working set size is much larger than what<br>the current setting covered, this is unlikely to help.<br>(257 al-extents correspond to ~ 1GByte working set)<br>If it is not about the size, but the change rate, of your working set,<br>you will need to upgrade to drbd 8.4.<br><br>&gt; http://www.drbd.org/users-guide-8.3/re-drbdsetup.html mentions that<br>&gt; --create-device "In case the specified DRBD device (minor number) does<br>&gt; not exist yet, create it implicitly."<br>&gt; <br>&gt; Unfortunately to me "device" is ambiguous, is this the block device file<br>&gt; in /dev, or the actual logical DR:BD device (i.e. the partition).<br><br>So what. "In case .* does not exist yet".<br>Well, it does exist.<br>So that's a no-op, right?<br><br>Anyways.&nbsp; That flag is passed from drbdadm to drbdsetup *always*<br>(in your drbd version).<br>And it does no harm. Not even to your data.<br>It's an internal convenience flag.<br><br>&gt; I don't want to create a new device, I just want to re-use the existing<br>&gt; one that's there and keep its data.<br>&gt; <br>&gt; &gt; As your server seems to be rather not-so-busy, IO wise,<br>&gt; &gt; I don't think this will even be noticable.<br>&gt; <br>&gt; Are there other parameters that I should be looking at?<br><br>If this is about DRBD tuning,<br>well, yes, there are many things to consider.<br>If there were just one optimal set of values,<br>those would be hardcoded, and not tunables.<br><br>&gt; Sync-rates perhaps?<br><br>Did you have resync going on during your "interesting" times?<br>If not, why bother, at this time, for this issue.<br>If yes, why would you always resync at noon and midnight?<br><br>&gt; Once again, the literature suggests this should be higher if the writes<br>&gt; are small and "scattered" in nature, which given we're logging data from<br>&gt; numerous sources, I'd expect to be the case.<br><br>Sync rate is not relevant at all here.<br>Those parameters control the background resynchronization<br>after connection loss and re-establishment.<br>As I understand, your DRBD is healthy, connected,<br>and happily replicating. No resync.<br><br>&gt; Thus following the documentation's recommendations (and not being an<br>&gt; expert myself) I figured I'd try carefully adjusting that figure to<br>&gt; something more appropriate.<br><br>Sure, careful is good.<br>Test system is even better ;-)<br><br>If you really want to improve on random write latency with DRBD,<br>you need to upgrade to 8.4. (8.4.5 will be released within days).<br><br>I guess that upgrade is too scary for such a system?<br><br>Also, you could use auditctl to find out in detail what is happenening<br>on your system. You likely want to play with that on a test system first<br>as well, until you get the event filters right,<br>or you could end up spamming your production systems logs.<br><br>-- <br>: Lars Ellenberg<br>: LINBIT | Your Way to High Availability<br>: DRBD/HA support and consulting http://www.linbit.com<br><br>DRBD� and LINBIT� are registered trademarks of LINBIT, Austria.<br>__<br>please don't Cc me, but send to list&nbsp;&nbsp; --&nbsp;&nbsp; I'm subscribed<br>_______________________________________________<br>drbd-user mailing list<br>drbd-user@lists.linbit.com<br>http://lists.linbit.com/mailman/listinfo/drbd-user<br></body></html>