Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Ok, I have several DRBD cluster pairs, all using Dell PowerEdge 1950 servers (twin dual-core Xeons with 2GB ram, 2x 250GB SATA disks on PERC 5 RAID-1) I'm running DRBD version 8.0.0 pre 4, as shipped with Mandriva 2007.1, using heartbeat to manage failover. eth0 on each server is connected to a Cisco 2950 and acts as the main service LAN eth1 on each server is connected via GigE crossover, for DRBD replication. Heartbeat uses both interfaces for healthchecking. I'm running PostgreSQL and another application from the DRBD partition, but neither are doing much yet since we're just testing things. (In fact PostgreSQL is all but idle, and the other app is just polling a bunch of servers every few minutes). /etc/drbd.conf: ## global { usage-count yes; } common { syncer { rate 100M; } } resource r0 { protocol C; handlers { pri-on-incon-degr "halt -f"; pri-lost-after-sb "halt -f"; outdate-peer "/usr/sbin/drbd-peer-outdater"; } startup { } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; } syncer { rate 100M; al-extents 257; } on server1 { device /dev/drbd0; disk /dev/sdb1; address 172.16.1.1:7788; meta-disk internal; } on server2 { device /dev/drbd0; disk /dev/sdb1; address 172.16.1.2:7788; meta-disk internal; } } ## Quite frequently (say every 90 secs or so) all I/O on the DRBD device seems to stall - interactive SSH sessions will hang for ~15 seconds. Generally speaking, vmstat shows 25% I/O wait, while top shows one of the 4 CPU's is at 100% I/O wait for extended periods. However, vmstat is also reporting that actual bytes transferred is negligble - mostly its <100 bytes/sec. Top is not obviously showing any processes that may be causing this issue. Periodically, and not on the same frequency as the stalls, I see the following in my syslog: Sep 16 14:49:22 server1 kernel: drbd0: ASSERT( b->n_req == set_size ) in drivers/block/drbd/drbd_main.c:299 Sep 16 14:49:22 server1 kernel: drbd0: b->n_req = 2 in drivers/block/drbd/drbd_main.c:307 Sep 16 14:49:22 server1 kernel: drbd0: set_size = 1 in drivers/block/drbd/drbd_main.c:308 Is there anything I can do to improve things? The raw horsepower these boxes have shouldn't be giving me anything like the I/O stalls I'm seeing. Mark. -- Mark Watts BSc RHCE MBCS Senior Systems Engineer QinetiQ Trusted Information Management Trusted Solutions and Services Group GPG Key: http://keyserver.veridis.com:11371/search?q=0x455420ED -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070918/6359fb98/attachment.pgp>