Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Ok, I have several DRBD cluster pairs, all using Dell PowerEdge 1950 servers
(twin dual-core Xeons with 2GB ram, 2x 250GB SATA disks on PERC 5 RAID-1)
I'm running DRBD version 8.0.0 pre 4, as shipped with Mandriva 2007.1, using
heartbeat to manage failover.
eth0 on each server is connected to a Cisco 2950 and acts as the main service
LAN
eth1 on each server is connected via GigE crossover, for DRBD replication.
Heartbeat uses both interfaces for healthchecking.
I'm running PostgreSQL and another application from the DRBD partition, but
neither are doing much yet since we're just testing things. (In fact
PostgreSQL is all but idle, and the other app is just polling a bunch of
servers every few minutes).
/etc/drbd.conf:
##
global {
usage-count yes;
}
common {
syncer { rate 100M; }
}
resource r0 {
protocol C;
handlers {
pri-on-incon-degr "halt -f";
pri-lost-after-sb "halt -f";
outdate-peer "/usr/sbin/drbd-peer-outdater";
}
startup {
}
disk {
on-io-error detach;
}
net {
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
}
syncer {
rate 100M;
al-extents 257;
}
on server1 {
device /dev/drbd0;
disk /dev/sdb1;
address 172.16.1.1:7788;
meta-disk internal;
}
on server2 {
device /dev/drbd0;
disk /dev/sdb1;
address 172.16.1.2:7788;
meta-disk internal;
}
}
##
Quite frequently (say every 90 secs or so) all I/O on the DRBD device seems to
stall - interactive SSH sessions will hang for ~15 seconds.
Generally speaking, vmstat shows 25% I/O wait, while top shows one of the 4
CPU's is at 100% I/O wait for extended periods. However, vmstat is also
reporting that actual bytes transferred is negligble - mostly its <100
bytes/sec.
Top is not obviously showing any processes that may be causing this issue.
Periodically, and not on the same frequency as the stalls, I see the following
in my syslog:
Sep 16 14:49:22 server1 kernel: drbd0: ASSERT( b->n_req == set_size ) in
drivers/block/drbd/drbd_main.c:299
Sep 16 14:49:22 server1 kernel: drbd0: b->n_req = 2 in
drivers/block/drbd/drbd_main.c:307
Sep 16 14:49:22 server1 kernel: drbd0: set_size = 1 in
drivers/block/drbd/drbd_main.c:308
Is there anything I can do to improve things? The raw horsepower these boxes
have shouldn't be giving me anything like the I/O stalls I'm seeing.
Mark.
--
Mark Watts BSc RHCE MBCS
Senior Systems Engineer
QinetiQ Trusted Information Management
Trusted Solutions and Services Group
GPG Key: http://keyserver.veridis.com:11371/search?q=0x455420ED
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070918/6359fb98/attachment.pgp>