[DRBD-user] Performance - upgrade hardware suggestions?

Thu Feb 7 07:28:17 CET 2013

Hi all,

I've been having significant performance issues with my DRBD storage,
and I'm trying to work out what I need to do to fix it. I'm assuming the
purchase of new hardware will be part of the solution, but I don't want
to do that if it isn't going to help.

I'm running DRBD with the secondary disconnected from 7am to 7pm, so
these performance figures are not affected by the network connection to
the secondary, nor the performance of the secondary server itself. The
primary is a 5 x 480G Intel SSD in RAID5 using Linux md (no battery
backed storage).

I have collected data (every 10 secs) into an RRD from
/sys/block/<device>/stat which includes the reads/writes, and the
backlog and activetime.

In particular, during times of high read/write, I see spikes in the
equivalent backlog and activetime values.

1) Does a high value for backlog imply that a lower layer is slowing
things down?
ie, if my drbd2 device shows a backlog value of 2100 (my max peak in the
past two hours), then that means the underlying md device is too slow?
At the same time drbd2 had a backlog value of 2100, the underlying
drives sd[b-f] had high values of around 70 to 100.

Graphs can be seen at:
http://203.98.89.64/graphs/
You can see the drbd2 and each of the SSD from the RAID5 array for
backlog/activetime

2) Does it follow that if I improve the performance of the underlying
RAID, then DRBD will improve, reducing the backlog, and making the users
happy?

/proc/mdstat:
md1 : active raid5 sdf1[0] sdc1[4] sdb1[5] sdd1[3] sde1[1]
      1863535104 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5]
[UUUUU]
      bitmap: 4/4 pages [16KB], 65536KB chunk

There is only a single partition on each drive, and I have left a small
part of each drive empty (which is supposed to help performance because
of the lack of TRIM).

Each SSD has it's scheduler set to deadline (which did seem to help a
little)....

DRBD config:
resource storage2 {
        protocol A;
        device /dev/drbd2 minor 2;
        meta-disk internal;
        on san1 {
                address 172.17.56.1:7802;
                disk /dev/md1;
        }
        on san2 {
                address 172.17.56.2:7802;
                disk /dev/md0;
        }

        net {
                after-sb-0pri discard-younger-primary;
                after-sb-1pri discard-secondary;
                after-sb-2pri call-pri-lost-after-sb;
                max-buffers 16000;
                max-epoch-size 16000;
                unplug-watermark 8192;
                sndbuf-size 512k;
                on-congestion pull-ahead;
        }
        startup {
                wfc-timeout 10;
                degr-wfc-timeout 20;
        }
        syncer {
                verify-alg crc32c;
                rate 20M;
                al-extents 3389;
        }
}

Any other comments on this config which might impact on performance?

Basically, I just want to rule out DRBD as being the performance
limiting factor, and focus on underlying hardware, but no point throwing
hardware at it if the problem is actually somewhere else.

Finally, suggestions on specific hardware which could be added to
improve the performance (or software, or config changes) ?

Thanks,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au