Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 20/12/12 09:59, Prater, James K. wrote: > What is/are the actual storage device(s)? The primary has 5 x 480G Intel SSD drives in a RAID5 configuration The secondary has 4 x 2TB WD RE4 drives in a RAID10 configuration RAID is using md raid, then drbd, then LVM, then iSCSI. Neither the primary nor the secondary is using a BBU RAID controller. All systems are connected by UPS, and the secondary (plus half the servers) are on a second UPS (and a second circuit). I don't wish to discard the option of purchasing additional hardware, but it is unlikely to be possible in the next 6 months, so would like to 'tune' the software side to avoid that need for now. Of course, I also don't want to undermine the stability of the system either, but some risks are acceptable. (as always, risk vs cost analysis)... Since this morning, I've now summarised the below logs to show the amount of data that was out of sync (in kB), and the time from becoming out of sync to back in sync (in seconds). Size Time 8756 4 7940 2 1664 2 5340 2 10656 5 122200 25 21168 7 5540 4 13216 4 8772 5 4880 4 41208 10 47776 12 7680 5 8360 5 35100 6 4400 2 3412 2 8260 3 12896 4 8680 3 14952 3 17824 7 37208 8 4584 3 13512 4 14416 4 4248 4 13884 4 8472 5 10804 4 4100 3 9020 4 8376 3 4884 3 13440 4 6736 4 9704 4 6008 3 2512 4 6000 4 1548 2 56216 10 6804 4 4872 4 10996 6 2360 2 144216 18 4260 4 In general, most of those are small amounts of data over small amounts of time, so it would be nice to add a little more buffering 'somewhere' so that instead of getting out of sync, the secondary just has more data un-written to disk in a buffer somewhere. Any suggestions or advice would be greatly appreciated. Regards, Adam > ----- Original Message ----- > From: Adam Goryachev [mailto:mailinglists at websitemanagers.com.au] > Sent: Wednesday, December 19, 2012 05:01 PM > To: drbd-user at lists.linbit.com <drbd-user at lists.linbit.com> > Subject: [DRBD-user] Secondary Performance > > Hi, > > I've been having a problem with performance of DRBD while the secondary > is connected, and I've tried various things (which have helped) but not > resolved the issue. So last night I upgraded to DRBD 8.3.15 and have > been running with that for the past 6 hours or so. I also enabled the > option: > on-congestion pull-ahead; > and changed my protocol to A (because I couldn't enable that option > otherwise). > > Since then, I've received the following log messages: > Dec 20 08:16:57 san1 kernel: [1235131.854167] block drbd2: > Congestion-extents threshold reached > Dec 20 08:16:57 san1 kernel: [1235131.854172] block drbd2: conn( > Connected -> Ahead ) pdsk( UpToDate -> Consistent ) > Dec 20 08:17:00 san1 kernel: [1235134.535128] block drbd2: helper > command: /sbin/drbdadm before-resync-source minor-2 > Dec 20 08:17:00 san1 kernel: [1235134.536144] block drbd2: helper > command: /sbin/drbdadm before-resync-source minor-2 exit code 0 (0x0) > Dec 20 08:17:00 san1 kernel: [1235134.536148] block drbd2: conn( Ahead > -> SyncSource ) pdsk( Consistent -> Inconsistent ) > Dec 20 08:17:00 san1 kernel: [1235134.536151] block drbd2: Began resync > as SyncSource (will sync 8756 KB [2189 bits set]). > Dec 20 08:17:00 san1 kernel: [1235134.543422] block drbd2: updated sync > UUID 75FEFEE50FF154E9:59B7F817FEF13BD6:8E57A8552E6B2153:8E56A8552E6B2153 > Dec 20 08:17:01 san1 kernel: [1235135.764082] block drbd2: Resync done > (total 1 sec; paused 0 sec; 8756 K/sec) > Dec 20 08:17:01 san1 kernel: [1235135.764086] block drbd2: updated UUIDs > 75FEFEE50FF154E9:0000000000000000:59B7F817FEF13BD6:8E57A8552E6B2153 > Dec 20 08:17:01 san1 kernel: [1235135.764091] block drbd2: conn( > SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) > Dec 20 08:17:01 san1 kernel: [1235135.828036] block drbd2: bitmap WRITE > of 0 pages took 0 jiffies > Dec 20 08:17:01 san1 kernel: [1235135.828038] block drbd2: 0 KB (0 bits) > marked out-of-sync by on disk bit-map. > > How might I read this information to determine what the shortfall in > performance is? It would look like there was some blocks (or portions of > blocks) total 8756KB that the secondary fell behind on. This doesn't > seem like a lot to fall behind, and considering that everything was > caught up again in around 4 seconds. > > Is there perhaps some additional buffering or tuning I could do within > DRBD (or on the secondary) to allow an additional 16M worth of buffering > available? > > Of course, the above is just one instance of falling behind, and if it > happens frequently enough, or by a large enough value, then the only > solution will be to upgrade the hardware in the machine. However, if it > is only falling behind by small amount then I'd rather just increase the > buffering a bit. > > Regards, > Adam > -- Adam Goryachev Website Managers www.websitemanagers.com.au