<HTML dir=ltr><HEAD><TITLE>[DRBD-user] Slower disk throughput on DRBD partition</TITLE>
<META content="text/html; charset=unicode" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 9.00.8112.16440"></HEAD>
<BODY>
<DIV dir=ltr id=idOWAReplyText85610>
<DIV dir=ltr><FONT size=2 face="Courier New">Hi Everyone,</FONT></DIV>
<DIV dir=ltr><FONT size=2 face="Courier New"></FONT> </DIV>
<DIV dir=ltr><FONT size=2 face="Courier New">I have a DRBD performance problem that has got me completely confused. I hoping that someone can help with this one as my other servers that use the same type of RAID cards and DRBD don't have this problem.</FONT></DIV>
<DIV dir=ltr><FONT size=2 face="Courier New"></FONT> </DIV>
<DIV dir=ltr><FONT face="Courier New"><FONT size=2>For the hardware, I have two Dell R515 servers with the H700 card, basically an LSI Megaraid based card, and running SLES 11 SP1. </FONT><FONT size=2>This problem shows up on drbd 8.3.11, 8.3.12, and 8.4.1 but I haven't tested other versions.</FONT></FONT></DIV>
<DIV dir=ltr><FONT size=2 face="Courier New"></FONT> </DIV>
<DIV dir=ltr><FONT size=2 face="Courier New">here is the simple config I made based on the servers that don't have any issues:</FONT></DIV>
<DIV dir=ltr><FONT size=2 face="Courier New"></FONT> </DIV>
<DIV dir=ltr><FONT face="Courier New">global {<BR> # We don't want to be bother by the usage count numbers<BR> usage-count no;<BR>}</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New">common {<BR> protocol C;</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"> net {<BR> cram-hmac-alg md5;<BR> shared-secret "P4ss";<BR> }<BR>}</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New">resource r0 {<BR> on san1 {<BR> device /dev/drbd0;<BR> disk /dev/disk/by-id/scsi-36782bcb0698b6300167badae13f2884d-part2;<BR> address 10.60.60.1:63000;<BR> flexible-meta-disk /dev/disk/by-id/scsi-36782bcb0698b6300167badae13f2884d-part1;<BR> }</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"> on san2 {<BR> device /dev/drbd0;<BR> disk /dev/disk/by-id/scsi-36782bcb0698b6e00167bb1d107a77a47-part2;<BR> address 10.60.60.2:63000;<BR> flexible-meta-disk /dev/disk/by-id/scsi-36782bcb0698b6e00167bb1d107a77a47-part1;<BR> }</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"> startup {<BR> wfc-timeout 5;<BR> }</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"> syncer {<BR> rate 50M;<BR> cpu-mask 4;<BR> }</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"> disk {<BR> on-io-error detach;<BR> no-disk-barrier;<BR> no-disk-flushes;<BR> no-disk-drain;<BR> no-md-flushes;<BR> }<BR>}</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"></FONT> </DIV>
<DIV dir=ltr><FONT face="Courier New">version: 8.3.11 (api:88/proto:86-96)<BR>GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by </FONT><A href="mailto:phil@fat-tyre"><FONT face="Courier New">phil@fat-tyre</FONT></A><FONT face="Courier New">, 2011-06-29 11:37:11<BR> 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----s<BR> ns:0 nr:0 dw:8501248 dr:551 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n oos:3397375600</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"></FONT> </DIV>
<DIV dir=ltr><FONT face="Courier New">So, when I'm running just with one server and no replication the performance hit with DRBD is huge. The backing device shows a throughput of:</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New">----</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New">san1:~ # dd if=/dev/zero of=/dev/disk/by-id/scsi-36782bcb0698b6300167badae13f2884d-part2 bs=1M count=16384<BR>16384+0 records in<BR>16384+0 records out<BR>17179869184 bytes (17 GB) copied, 16.4434 s, 1.0 GB/s</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New">----</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New">san1:~ # dd if=/dev/zero of=/dev/drbd/by-res/r0 bs=1M count=16384<BR>16384+0 records in<BR>16384+0 records out<BR>17179869184 bytes (17 GB) copied, 93.457 s, 184 MB/s</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New">-------</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"></FONT> </DIV>
<DIV dir=ltr><FONT face="Courier New">using iostat I see part of the problem:</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"></FONT> </DIV>
<DIV dir=ltr><FONT face="Courier New">avg-cpu: %user %nice %system %iowait %steal %idle<BR> 0.08 0.00 16.76 0.00 0.00 83.17</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New">Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn<BR>sda 0.00 0.00 0.00 0 0<BR>sdb 20565.00 0.00 360.00 0 719<BR>drbd0 737449.50 0.00 360.08 0 720</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New">avg-cpu: %user %nice %system %iowait %steal %idle<BR> 0.07 0.00 28.87 1.37 0.00 69.69</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New">Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn<BR>sda 1.50 0.00 0.01 0 0<BR>sdb 57859.50 0.00 177.22 0 354<BR>drbd0 362787.00 0.00 177.14 0 354</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"></FONT> </DIV>
<DIV dir=ltr><FONT face="Courier New">the drbd device is showing a TPS about 10x - 20x of the backing store. When I do this on my other servers I don't see anything like it. The working servers are also running the same kernel and drbd versions. </FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"></FONT> </DIV>
<DIV dir=ltr><FONT face="Courier New">Does anyone have any ideas of how this might be resolved or fixed? I'm at a loss right now.</FONT></DIV>
<DIV dir=ltr><FONT face="Courier New"></FONT> </DIV></DIV></BODY></HTML>