[DRBD-user] performance issues during a resync

Louis Munro lmunro at inverse.ca
Mon Jun 27 18:18:15 CEST 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I manage a pair of servers in two separate datacenters that replicate a device (backing a MySQL database) between the two.

Knowing that there was a storage issue in one of the datacenters that meant that I/O was sometimes blocking for a few minutes, I set the replication protocol to A.
That way (or so I believed), when the storage felt like acting up in our secondary datacenter the worse that could happen was that replication would lag, or even fail for a few minutes and then reconnect and catch up.

Last Thursday we experienced a situation when the storage problem did crop up, drbd connection between the two DCs was broken and then replication restarted.
The problem I faced was that during that replication, I/O latency seems to have been downgraded on both primary and secondary servers, just like I would have expected (and had previously experienced) using protocol C.
Note that I/O was still slowed down in my secondary DC while the resync was taking place.

It’s as though, while replication is asynchronous, resynchronizations are actually synchronous.
Am I missing something? 
How can I avoid resyncs dragging down my I/O latency on the primary if the secondary is dragging it’s feet? 

Some details and logs follow.


This is a pair of CentOS 6 servers, running drbd 8.4.5 on a 2.6.32-573.18.1 kernel.


I am running with the following common configuration: 

common {
    protocol A;

    handlers {

    }

    startup {
        degr-wfc-timeout 120;
    }

    syncer {
        rate 500M;
        al-extents 257;
    }
    options {
    }

    disk {
        on-io-error     detach;
    }

    net {
		verify-alg md5;
    }
}


This is what I have in the logs for the primary server: 

Jun 23 15:53:18 pf-db1 kernel: block drbd0: Remote failed to finish a request within ko-count * timeout
Jun 23 15:53:18 pf-db1 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
Jun 23 15:53:18 pf-db1 kernel: drbd mysql: asender terminated
Jun 23 15:53:18 pf-db1 kernel: drbd mysql: Terminating drbd_a_mysql
Jun 23 15:53:18 pf-db1 kernel: block drbd0: new current UUID C7BC9B26D8B411FB:5386F0E7BDD5DE27:A561B9F4409AEACD:A560B9F4409AEACD
Jun 23 15:53:18 pf-db1 kernel: drbd mysql: Connection closed
Jun 23 15:53:18 pf-db1 kernel: drbd mysql: conn( Timeout -> Unconnected )
Jun 23 15:53:18 pf-db1 kernel: drbd mysql: receiver terminated
Jun 23 15:53:18 pf-db1 kernel: drbd mysql: Restarting receiver thread
Jun 23 15:53:18 pf-db1 kernel: drbd mysql: receiver (re)started
Jun 23 15:53:18 pf-db1 kernel: drbd mysql: conn( Unconnected -> WFConnection )
Jun 23 15:53:22 pf-db1 kernel: drbd mysql: Handshake successful: Agreed network protocol version 101
Jun 23 15:53:22 pf-db1 kernel: drbd mysql: Agreed to support TRIM on protocol level
Jun 23 15:53:22 pf-db1 kernel: drbd mysql: conn( WFConnection -> WFReportParams )
Jun 23 15:53:22 pf-db1 kernel: drbd mysql: Starting asender thread (from drbd_r_mysql [20164])
Jun 23 15:53:22 pf-db1 kernel: block drbd0: drbd_sync_handshake:
Jun 23 15:53:22 pf-db1 kernel: block drbd0: self C7BC9B26D8B411FB:5386F0E7BDD5DE27:A561B9F4409AEACD:A560B9F4409AEACD bits:1151 flags:0
Jun 23 15:53:22 pf-db1 kernel: block drbd0: peer 5386F0E7BDD5DE26:0000000000000000:A561B9F4409AEACC:A560B9F4409AEACD bits:0 flags:0
Jun 23 15:53:22 pf-db1 kernel: block drbd0: uuid_compare()=1 by rule 70
Jun 23 15:53:22 pf-db1 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
Jun 23 15:53:22 pf-db1 kernel: block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 459(1), total 459; compression: 100.0%
Jun 23 15:53:25 pf-db1 kernel: block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 459(1), total 459; compression: 100.0%
Jun 23 15:53:25 pf-db1 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0
Jun 23 15:53:26 pf-db1 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
Jun 23 15:53:26 pf-db1 kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
Jun 23 15:53:26 pf-db1 kernel: block drbd0: Began resync as SyncSource (will sync 6952 KB [1738 bits set]).
Jun 23 15:53:26 pf-db1 kernel: block drbd0: updated sync UUID C7BC9B26D8B411FB:5387F0E7BDD5DE27:5386F0E7BDD5DE27:A561B9F4409AEACD
Jun 23 15:53:38 pf-db1 kernel: drbd mysql: [drbd_w_mysql/7339] sock_sendmsg time expired, ko = 6
Jun 23 15:54:27 pf-db1 kernel: drbd mysql: [drbd_w_mysql/7339] sock_sendmsg time expired, ko = 6
Jun 23 15:54:33 pf-db1 kernel: drbd mysql: [drbd_w_mysql/7339] sock_sendmsg time expired, ko = 5
Jun 23 15:54:47 pf-db1 kernel: drbd mysql: [drbd_w_mysql/7339] sock_sendmsg time expired, ko = 4
Jun 23 15:55:03 pf-db1 kernel: drbd mysql: [drbd_w_mysql/7339] sock_sendmsg time expired, ko = 6
Jun 23 15:55:09 pf-db1 kernel: drbd mysql: [drbd_w_mysql/7339] sock_sendmsg time expired, ko = 5
Jun 23 15:55:15 pf-db1 kernel: drbd mysql: [drbd_w_mysql/7339] sock_sendmsg time expired, ko = 4
Jun 23 15:58:37 pf-db1 kernel: block drbd0: Resync done (total 310 sec; paused 0 sec; 20 K/sec)
Jun 23 15:58:37 pf-db1 kernel: block drbd0: updated UUIDs C7BC9B26D8B411FB:0000000000000000:5387F0E7BDD5DE27:5386F0E7BDD5DE27
Jun 23 15:58:37 pf-db1 kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )


This is sar reporting on block devices while the problem was happening: 

# sar -f /var/log/sa/sa23 -dp -s 15:43:00 -e 16:09:00 | awk '$3 ~ /DEV/ || $3 ~ /drbd0/'
03:43:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
03:44:01 PM     drbd0     51.60      0.00   1601.61     31.04      0.04      0.83      0.19      0.99
03:45:01 PM     drbd0     21.44      0.00    571.93     26.68      1.14     51.26     40.69     87.25
03:46:01 PM     drbd0     17.72      0.00    406.49     22.94      1.20     65.60     53.99     95.65
03:47:01 PM     drbd0     13.04      0.00    217.15     16.66      1.43    113.87     76.94    100.31
03:48:01 PM     drbd0      5.16      0.00     85.15     16.49      1.30    244.81    191.23     98.73
03:49:02 PM     drbd0      6.88      0.00     97.98     14.23      1.30    193.08    145.69    100.29
03:50:01 PM     drbd0     10.53      0.00    180.82     17.18      1.51    120.57     95.35    100.38
03:51:01 PM     drbd0      6.20      0.00     96.33     15.55      1.47    279.11    161.82    100.26
03:52:01 PM     drbd0      9.71      0.00    183.82     18.92      1.36    118.30    103.47    100.51
03:53:01 PM     drbd0      2.43      0.00     37.73     15.56      1.94    232.62    414.01    100.40
03:54:01 PM     drbd0     12.55      0.00    398.39     31.75      8.01    736.01     69.56     87.28
03:55:01 PM     drbd0      3.06      0.00     29.31      9.57    128.10    675.81    328.01    100.43
03:56:01 PM     drbd0      1.34      0.00     39.36     29.40     43.34 126599.23    750.34    100.45
03:57:01 PM     drbd0      5.36      0.00    128.97     24.05      1.60    258.42    186.98    100.27
03:58:01 PM     drbd0     14.02      0.00    287.79     20.53      3.30    264.49     71.71    100.52
03:59:01 PM     drbd0     23.30      0.00    654.19     28.08      2.32    103.39     42.16     98.23
04:00:01 PM     drbd0     40.33      0.00   1390.08     34.47      3.56     88.24     19.73     79.58
04:01:01 PM     drbd0     37.54      0.00   1225.94     32.66      3.62     96.50     22.01     82.64
04:02:01 PM     drbd0     47.52      0.00   1602.41     33.72      2.41     50.75      8.77     41.68
04:03:01 PM     drbd0     60.49      0.00   2000.13     33.07      1.72     28.36      3.88     23.45
04:04:01 PM     drbd0     67.02      0.00   2297.37     34.28      1.01     15.12      1.40      9.39
04:05:01 PM     drbd0     59.32      0.00   1996.30     33.65      0.89     15.09      1.82     10.77
04:06:01 PM     drbd0     58.68      0.00   2023.81     34.49      1.55     26.38      3.58     20.99
04:07:01 PM     drbd0     57.96      0.00   1989.85     34.33      0.98     16.98      1.71      9.92
04:08:01 PM     drbd0     55.01      0.00   1872.40     34.03      0.99     17.99      2.14     11.80



Any help is greatly appreciated.

Best regards,
--
Louis Munro
lmunro at inverse.ca  ::  www.inverse.ca 
+1.514.447.4918 x125  :: +1 (866) 353-6153 x125
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence (www.packetfence.org)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160627/3d1237a4/attachment.htm>


More information about the drbd-user mailing list