[DRBD-user] using "disk-drain" with plain scsi drives?

Wed Jul 1 23:43:12 CEST 2009

Hello,

I'm trying to setup and get the best performance out of the following
configuration:

2 Linux x86_64 systems, as close to "identical" as it gets (except the
hard drive models/firmware may differ), same software running on both
(kernel, DRBD versions, everything else).
DRBD version 8.3.1, Linux 2.6.27.7, Heartbeat 2.1.4

Systems are connected with copper GigE connection, dedicated to DRBD.

DRBD devices are created on top of plain SCSI devices (AIC-7902B U320
card, aic79xx, sd drivers).
The filesystem is ext3, I'm using protocol C in Primary/Secondary mode.
SCSI queue_depth is set to the same value on both systems (currently
32). 
The system is running MySQL (InnoDB and MyISAM), the workload is
write-bound.

Is there any potential danger to data integrity in using 'disk-drain'
method in this configuration?

I've ran several benchmarks, using disk-barriers gave me worst results,
"no-disk-barrier" is slightly better, however using both
"no-disk-barrier" and "no-disk-flushes" is the only combination so far
that produced reasonable performance, considerably better than the other
two. I've pasted some of the results.

However, drbd.conf man page says that "In case your backing storage
device has a volatile write cache (plain disks, RAID of plain disks) you
should use one of the first two [options]". 
The question is whether the data integrity may be compromised though.
Does this mean that "disk-drain" method is unsafe to use in this case?
As far as I understand, it probably shouldn't be, but I would appreciate
an advice.

I tried to tweak other options (sndbuf-size, max-buffers, using jumbo
frames on the NIC, tried to use a separate gigE link for each DRBD
device), without much help...
A sample from my current drbd.conf (one device, the other one is
identically configured)

common {
  syncer { rate 30M; }
  net {
    sndbuf-size 256k;
    max-buffers 256;
  }
  disk {
    no-disk-barrier;
    no-disk-flushes;
  }
}

resource innodb {
  protocol C;

  disk {
    on-io-error   pass_on;
    fencing resource-only;
  }

  net {
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    al-extents 3833;
  }

  on node1 {
    device     /dev/drbd0;
    disk       /dev/sdb10;
    address    192.168.0.75:7788;
    meta-disk  internal;
  }

  on node2 {
    device    /dev/drbd0;
    disk      /dev/sdb10;
    address   192.168.0.95:7788;
    meta-disk internal;
  }
}

Some of the performance results:

local drive, non-drbd
dd if=/dev/zero bs=4096 count=10000 of=/data2/ddtest1 oflag=dsync
40960000 bytes (41 MB) copied, 14.7353 s, 2.8 MB/s

DRBD with "disk-barrier" (no write-ordering option specified)
dd if=/dev/zero bs=4096 count=10000 of=/data/ddtest1 oflag=dsync (drbd
disconnected)
40960000 bytes (41 MB) copied, 25.8164 s, 1.6 MB/s
dd if=/dev/zero bs=4096 count=10000 of=/data/ddtest1 oflag=dsync (drbd
connected)
40960000 bytes (41 MB) copied, 321.029 s, 128 kB/s

DRBD with "no-disk-barrier"
dd if=/dev/zero bs=4096 count=10000 of=/data/ddtest1 oflag=dsync (drbd
connected)
40960000 bytes (41 MB) copied, 253.917 s, 161 kB/s

DRBD with "no-disk-barrier" and "no-disk-flushes" (using "disk-drain")
dd if=/dev/zero bs=4096 count=10000 of=/data/ddtest1 oflag=dsync (drbd
connected)
40960000 bytes (41 MB) copied, 35.4789 s, 1.2 MB/s

ping -w 10 -f -s 4100 192.168.0.75
PING 192.168.0.75 (192.168.0.75) 4100(4128) bytes of data.
.
--- 192.168.0.75 ping statistics ---
39816 packets transmitted, 39815 received, 0% packet loss, time 9999ms
rtt min/avg/max/mdev = 0.141/0.219/0.710/0.029 ms, ipg/ewma 0.251/0.220
ms

Thanks for your advice and Thanks a lot for the great software!
Michael