[DRBD-user] Howto find latency bottleneck

Thu Sep 2 15:14:31 CEST 2010

  On 09/01/2010 06:29 PM, Robert Verspuy wrote:
> *Latency: (changed to write 1000 times 4k)*
>
> DRBD Connected:
> 4096000 bytes (4.1 MB) copied, 24.7744 seconds, 165 kB/s
>
> DRBD Disconneted:
> 4096000 bytes (4.1 MB) copied, 0.198809 seconds, 20.6 MB/s
>
I've moved the meta-data to a ramdisk on both servers, but the latency 
is still the same.
Also tried the deadline scheduler, enable / disable

One other thing I saw was the amount of interrupts on de secondary server.
The 8 SATA disks are connected to 2 sata controllers.

During the latency test the amount of interrupts
on the primary server on the sata_mv card is around 1170.
on the secondary server on the sata_mv card is around 1804.

On the primary server on the ahci card is around 376
On the secondary server on the ahci card is around 4308

So for the ahci driver I have about 10 times more interrupts on the 
secondary server, then on the primary server.

I just tried to add the options no-disk-barrier, but no difference.
When added no-disk-flushes, I suddenly got good performance numbers:

So the extra latency is somewhere in the flushing part by drbd on the md 
device.
But why is the flushing on the secondary much slower, compared to 
flushing on the primary?
I also tested this with running db01 as primary, disconnected db02,
and running the latency test on /dev/drbd0.

I assume (because now I'm still writing to the drdb device but without 
active secondary), that drbd is still running the flushes on /dev/md2?

Also tested this while making db02 primary, and db01 secondary, to see 
if there would be any raid problems on db02,
but I get exactly the same results.

When running the the latency test on the primary without secondary attached
[root at db02 drbd]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by 
mockbuild at builder10.centos.org, 2010-06-04 08:04:09
  0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown B r----
     ns:4000 nr:4000 dw:12000 dr:0 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 
wo:f oos:4000
[root at db02 drbd]# dd if=/dev/zero of=/dev/drbd0 bs=4k count=1000 
oflag=direct
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB) copied, 0.205476 seconds, 19.9 MB/s

With secondary arracted:
[root at db02 drbd]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by 
mockbuild at builder10.centos.org, 2010-06-04 08:04:09
  0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate B r----
     ns:8000 nr:4000 dw:12000 dr:4000 al:1 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1 
wo:f oos:0
[root at db02 drbd]# dd if=/dev/zero of=/dev/drbd0 bs=4k count=1000 
oflag=direct
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB) copied, 27.4238 seconds, 149 kB/s

But when using Protocol C, and no-disk-barrier and no-disk-flushes, I get:
[root at db01 drbd-test]# dd if=/dev/zero of=/dev/drbd0 bs=4k count=1000 
oflag=direct
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB) copied, 0.504855 seconds, 8.1 MB/s

I also see according to dstat, that the 4.1 MBytes are saved to disk on 
the secondary within 2 seconds after the dd command.

The documentaton states that you must not use no-disk-barrier and 
no-disk-flushes when you don't have a raid controller with battery backup.
But is this really gonna give me problems?
Both servers have 2 power supplies, attached to 2 power feeds in a data 
centre.
So the change both servers will loose power at the same time is already 
very minimal.
And even then both server loose power, the postgresql database is ACID, 
so there will always be a consistent data.
Then I may have lost a few database updates / inserts, but we're not 
going to run a database for banking or a nuclear power reactor :)

So I think I will setup DRBD with Protocol C and no-disk-barrier and 
no-disk-flushes.

-- 
*Exa-Omicron*
Patroonsweg 10
3892 DB Zeewolde
Tel.: 088-OMICRON (66 427 66)
http://www.exa-omicron.nl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100902/583095c9/attachment.htm>