Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Everyone, I have a DRBD performance problem that has got me completely confused. I hoping that someone can help with this one as my other servers that use the same type of RAID cards and DRBD don't have this problem. For the hardware, I have two Dell R515 servers with the H700 card, basically an LSI Megaraid based card, and running SLES 11 SP1. This problem shows up on drbd 8.3.11, 8.3.12, and 8.4.1 but I haven't tested other versions. here is the simple config I made based on the servers that don't have any issues: global { # We don't want to be bother by the usage count numbers usage-count no; } common { protocol C; net { cram-hmac-alg md5; shared-secret "P4ss"; } } resource r0 { on san1 { device /dev/drbd0; disk /dev/disk/by-id/scsi-36782bcb0698b6300167badae13f2884d-part2; address 10.60.60.1:63000; flexible-meta-disk /dev/disk/by-id/scsi-36782bcb0698b6300167badae13f2884d-part1; } on san2 { device /dev/drbd0; disk /dev/disk/by-id/scsi-36782bcb0698b6e00167bb1d107a77a47-part2; address 10.60.60.2:63000; flexible-meta-disk /dev/disk/by-id/scsi-36782bcb0698b6e00167bb1d107a77a47-part1; } startup { wfc-timeout 5; } syncer { rate 50M; cpu-mask 4; } disk { on-io-error detach; no-disk-barrier; no-disk-flushes; no-disk-drain; no-md-flushes; } } version: 8.3.11 (api:88/proto:86-96) GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by phil at fat-tyre <mailto:phil at fat-tyre> , 2011-06-29 11:37:11 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----s ns:0 nr:0 dw:8501248 dr:551 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n oos:3397375600 So, when I'm running just with one server and no replication the performance hit with DRBD is huge. The backing device shows a throughput of: ---- san1:~ # dd if=/dev/zero of=/dev/disk/by-id/scsi-36782bcb0698b6300167badae13f2884d-part2 bs=1M count=16384 16384+0 records in 16384+0 records out 17179869184 bytes (17 GB) copied, 16.4434 s, 1.0 GB/s ---- san1:~ # dd if=/dev/zero of=/dev/drbd/by-res/r0 bs=1M count=16384 16384+0 records in 16384+0 records out 17179869184 bytes (17 GB) copied, 93.457 s, 184 MB/s ------- using iostat I see part of the problem: avg-cpu: %user %nice %system %iowait %steal %idle 0.08 0.00 16.76 0.00 0.00 83.17 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 0.00 0.00 0.00 0 0 sdb 20565.00 0.00 360.00 0 719 drbd0 737449.50 0.00 360.08 0 720 avg-cpu: %user %nice %system %iowait %steal %idle 0.07 0.00 28.87 1.37 0.00 69.69 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 1.50 0.00 0.01 0 0 sdb 57859.50 0.00 177.22 0 354 drbd0 362787.00 0.00 177.14 0 354 the drbd device is showing a TPS about 10x - 20x of the backing store. When I do this on my other servers I don't see anything like it. The working servers are also running the same kernel and drbd versions. Does anyone have any ideas of how this might be resolved or fixed? I'm at a loss right now. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120201/9828aca2/attachment.htm>