Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have a SuperMIcro H8DGi-F based system with (2) AMD Opteron 6128 processors. It has 32GB of Ram. It has a LSI MegaRaid SAS 9265-8i card. There are 16 Seagate Constellation ES 2TB SAS2 Drives in a Raid 6 shelf. The operating system is Oracle Linux 6.1. My kernel is: 2.6.32-131.0.15.el6.x86_64. The version of DRBD came from ATrpms: http://packages.atrpms.net/dist/el6/drbd/ Here's the parted info: [root at storageb drbd.d]# parted /dev/sda GNU Parted 2.1 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) print Model: LSI MR9265-8i (scsi) Disk /dev/sda: 28.0TB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 14.0TB 14.0TB 2 14.0TB 28.0TB 14.0TB Here is my drbd.conf: # You can find an example in /usr/share/doc/drbd.../drbd.conf.example include "drbd.d/global_common.conf"; include "drbd.d/*.res"; Here is my global_common.conf: global { usage-count yes; # minor-count dialog-refresh disable-ip-verification } common { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb } disk { # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes # no-disk-drain no-md-flushes max-bio-bvecs } net { # sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork } syncer { # rate after al-extents use-rle cpu-mask verify-alg csums-alg } } Here is my r0.res: # begin resource drbd0 resource r0 { protocol C; startup { wfc-timeout 15; degr-wfc-timeout 60; } net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } syncer { } on storagea { device /dev/drbd0; disk /dev/sda1; address 10.0.100.241:7788; meta-disk internal; } on storageb { device /dev/drbd0; disk /dev/sda1; address 10.0.100.242:7788; meta-disk internal; } } Here is the drbd status from /proc/drbd: version: 8.3.8.1 (api:88/proto:86-94) GIT-hash: 0d8589fcc32c874df57c930ca1691399b55ec893 build by gardner@, 2011-05-21 19:18:16 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r---d ns:0 nr:0 dw:504388120 dr:2124 al:123972 bm:123845 lo:380 pe:0 ua:0 ap:381 ep:1 wo:b oos:13671456264 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r---- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:13669326412 As you can see, I haven't connected the other node, so I'm assuming that the network has nothing to do with the performance at this point. Below is the script I ran to do the performance testing. I basically took the script from the user guide and removed the oflag=direct, because when it was in there, it brought the performance down to 26MB/s (not really my focus here, but maybe related?). Anyways, it's at 64GB with a 4KB block because there is 32GB of ram on the box. #!/bin/bash TEST_RESOURCE=r0 TEST_DEVICE=$(drbdadm sh-dev $TEST_RESOURCE) TEST_LL_DEVICE=$(drbdadm sh-ll-dev $TEST_RESOURCE) drbdadm primary $TEST_RESOURCE for i in $(seq 5); do dd if=/dev/zero of=$TEST_DEVICE bs=4K count=16777216 done drbdadm down $TEST_RESOURCE for i in $(seq 5); do dd if=/dev/zero of=$TEST_LL_DEVICE bs=4K count=16777216 done And last of all, here is the result: [root at storageb ~]# ./drbd-throughput-performance.sh 16777216+0 records in 16777216+0 records out 68719476736 bytes (69 GB) copied, 208.336 s, 330 MB/s 16777216+0 records in 16777216+0 records out 68719476736 bytes (69 GB) copied, 200.24 s, 343 MB/s 16777216+0 records in 16777216+0 records out 68719476736 bytes (69 GB) copied, 199.761 s, 344 MB/s 16777216+0 records in 16777216+0 records out 68719476736 bytes (69 GB) copied, 178.938 s, 384 MB/s 16777216+0 records in 16777216+0 records out 68719476736 bytes (69 GB) copied, 157.173 s, 437 MB/s 0: State change failed: (-12) Device is held open by someone Command 'drbdsetup 0 down' terminated with exit code 11 16777216+0 records in 16777216+0 records out 68719476736 bytes (69 GB) copied, 87.6337 s, 784 MB/s 16777216+0 records in 16777216+0 records out 68719476736 bytes (69 GB) copied, 90.8693 s, 756 MB/s 16777216+0 records in 16777216+0 records out 68719476736 bytes (69 GB) copied, 89.9055 s, 764 MB/s 16777216+0 records in 16777216+0 records out 68719476736 bytes (69 GB) copied, 91.1342 s, 754 MB/s 16777216+0 records in 16777216+0 records out 68719476736 bytes (69 GB) copied, 91.9999 s, 747 MB/s In conclusion, I'm seeing an almost 50% decrease in performance when using drbd. Is this what I should expect? Or am I doing something wrong? Thanks for all your help in advance. ~Noah Scanned for viruses and content by the Tranet Spam Sentinel service.