Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hey guys,
So we've recently upgraded to DRBD 8.4.2, and have been noticing some... odd behavior. Here's a sar extract for a glitch we noticed last night:
12:00:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
10:08:01 PM dev251-0 833.61 43883.30 13928.83 69.35 1.21 1.46 0.07 6.11
10:08:01 PM dev147-0 829.07 43883.30 13890.15 69.68 2119.54 1126.62 0.84 69.91
dev251 is the backing device, and 147 is the corresponding DRBD device. I understand there would be some kind of overhead with DRBD, but 60% with an await of over a second, which drives the queue up to 2000+ items, seems a little wrong. For reference, here's the DRBD config for that device, from drbdsetup:
resource edb {
options {
}
net {
max-buffers 131072;
verify-alg "md5";
}
_remote_host {
address ipv4 10.2.128.207:7788;
}
_this_host {
address ipv4 10.2.128.208:7788;
volume 0 {
device minor 0;
disk "/dev/fioa";
meta-disk internal;
disk {
fencing resource-only;
disk-flushes no;
md-flushes no;
resync-rate 307200k; # bytes/second
c-fill-target 6144s; # bytes
c-max-rate 307200k; # bytes/second
}
}
}
}
I've adjusted the c-fill-target based on a 10G link with 0.1ms average ping time, and set max-buffers to the highest setting for similar reasons. And while that did clear up some of the issues we were seeing, every once in a while, we get spikes like the sar extract.
Am I missing something?
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas at optionshouse.com
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email