[DRBD-user] Secondary Performance

Thu Dec 20 08:31:13 CET 2012

On 20/12/12 09:59, Prater, James K. wrote:
> What is/are the actual storage device(s)?

The primary has 5 x 480G Intel SSD drives in a RAID5 configuration
The secondary has 4 x 2TB WD RE4 drives in a RAID10 configuration

RAID is using md raid, then drbd, then LVM, then iSCSI.

Neither the primary nor the secondary is using a BBU RAID controller.
All systems are connected by UPS, and the secondary (plus half the
servers) are on a second UPS (and a second circuit).

I don't wish to discard the option of purchasing additional hardware,
but it is unlikely to be possible in the next 6 months, so would like to
'tune' the software side to avoid that need for now. Of course, I also
don't want to undermine the stability of the system either, but some
risks are acceptable. (as always, risk vs cost analysis)...

Since this morning, I've now summarised the below logs to show the
amount of data that was out of sync (in kB), and the time from becoming
out of sync to back in sync (in seconds).

Size	Time
8756	4
7940	2
1664	2
5340	2
10656	5
122200	25
21168	7
5540	4
13216	4
8772	5
4880	4
41208	10
47776	12
7680	5
8360	5
35100	6
4400	2
3412	2
8260	3
12896	4
8680	3
14952	3
17824	7
37208	8
4584	3
13512	4
14416	4
4248	4
13884	4
8472	5
10804	4
4100	3
9020	4
8376	3
4884	3
13440	4
6736	4
9704	4
6008	3
2512	4
6000	4
1548	2
56216	10
6804	4
4872	4
10996	6
2360	2
144216	18
4260	4

In general, most of those are small amounts of data over small amounts
of time, so it would be nice to add a little more buffering 'somewhere'
so that instead of getting out of sync, the secondary just has more data
un-written to disk in a buffer somewhere.

Any suggestions or advice would be greatly appreciated.

Regards,
Adam

> ----- Original Message -----
> From: Adam Goryachev [mailto:mailinglists at websitemanagers.com.au]
> Sent: Wednesday, December 19, 2012 05:01 PM
> To: drbd-user at lists.linbit.com <drbd-user at lists.linbit.com>
> Subject: [DRBD-user] Secondary Performance
> 
> Hi,
> 
> I've been having a problem with performance of DRBD while the secondary
> is connected, and I've tried various things (which have helped) but not
> resolved the issue. So last night I upgraded to DRBD 8.3.15 and have
> been running with that for the past 6 hours or so. I also enabled the
> option:
> on-congestion pull-ahead;
> and changed my protocol to A (because I couldn't enable that option
> otherwise).
> 
> Since then, I've received the following log messages:
> Dec 20 08:16:57 san1 kernel: [1235131.854167] block drbd2:
> Congestion-extents threshold reached
> Dec 20 08:16:57 san1 kernel: [1235131.854172] block drbd2: conn(
> Connected -> Ahead ) pdsk( UpToDate -> Consistent )
> Dec 20 08:17:00 san1 kernel: [1235134.535128] block drbd2: helper
> command: /sbin/drbdadm before-resync-source minor-2
> Dec 20 08:17:00 san1 kernel: [1235134.536144] block drbd2: helper
> command: /sbin/drbdadm before-resync-source minor-2 exit code 0 (0x0)
> Dec 20 08:17:00 san1 kernel: [1235134.536148] block drbd2: conn( Ahead
> -> SyncSource ) pdsk( Consistent -> Inconsistent )
> Dec 20 08:17:00 san1 kernel: [1235134.536151] block drbd2: Began resync
> as SyncSource (will sync 8756 KB [2189 bits set]).
> Dec 20 08:17:00 san1 kernel: [1235134.543422] block drbd2: updated sync
> UUID 75FEFEE50FF154E9:59B7F817FEF13BD6:8E57A8552E6B2153:8E56A8552E6B2153
> Dec 20 08:17:01 san1 kernel: [1235135.764082] block drbd2: Resync done
> (total 1 sec; paused 0 sec; 8756 K/sec)
> Dec 20 08:17:01 san1 kernel: [1235135.764086] block drbd2: updated UUIDs
> 75FEFEE50FF154E9:0000000000000000:59B7F817FEF13BD6:8E57A8552E6B2153
> Dec 20 08:17:01 san1 kernel: [1235135.764091] block drbd2: conn(
> SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
> Dec 20 08:17:01 san1 kernel: [1235135.828036] block drbd2: bitmap WRITE
> of 0 pages took 0 jiffies
> Dec 20 08:17:01 san1 kernel: [1235135.828038] block drbd2: 0 KB (0 bits)
> marked out-of-sync by on disk bit-map.
> 
> How might I read this information to determine what the shortfall in
> performance is? It would look like there was some blocks (or portions of
> blocks) total 8756KB that the secondary fell behind on. This doesn't
> seem like a lot to fall behind, and considering that everything was
> caught up again in around 4 seconds.
> 
> Is there perhaps some additional buffering or tuning I could do within
> DRBD (or on the secondary) to allow an additional 16M worth of buffering
> available?
> 
> Of course, the above is just one instance of falling behind, and if it
> happens frequently enough, or by a large enough value, then the only
> solution will be to upgrade the hardware in the machine. However, if it
> is only falling behind by small amount then I'd rather just increase the
> buffering a bit.
> 
> Regards,
> Adam
> 

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au