[DRBD-user] High iowait on primary DRBD node with large sustained writes and replication enabled to secondary

Fri Jan 11 12:08:02 CET 2013

On 11.01.2013 04:36, Paul Freeman wrote:
> Sebastian,
> I have collected some tcpdump logs on the internode link used for drbd while writing 100M to the primary drbd node using dd (as before) and with the relevant drbd resource connected.
> 
> I am not very proficient at interpreting the tcpdump traffic so please bear with me:-)
> 
> I ran tcpdump with the options "-vvv -s0 tcp port 7798".  Port 7798 is the port used by this drbd resource.  I subsequently used wireshark to identify the tcp stream associated with the replication traffic and set an appropriate filter to extract those packets and save them to a separate packet capture file.
> 
> My first guess is the segment ACK RTT may be relevant to our investigation.  When I extract the segment ACKs from the secondary node using tshark and tcp.analysis.ack_rtt then process them in MS Excel, the results are as follows:
> 
> Number of segment ACKs:	5255
> Average RTT: 1.87ms
> Minimum RTT: 0.078ms
> Maximum RTT: 41.45ms (actually the last ACK)
> 
> The time for dd to complete was 950ms.
> 
> Does this provide any information on the latency problem?

Better would be writing locally on the primary. So that we only see the
network latency of the inter-node connection and not of the iSCSI
connection as well.

I had a deeper look at your blktrace output.

1 MB from queuing the first 128 KiB chunk to queuing the 9th 128 KiB
chunk on the DRBD device lasts:

discon  connec
1.33 ms 8.86 ms

Difference: 7.53 ms.

Also the latency between queuing (Q) the first IO on the DRBD device and
dispatching (D) it on the SCSI disk below changed. This is the change of
DRBD latency.

discon  connec
8.7 us  12.8 us

4.1 us * 8 = 32.8 us. This is negligible.

So the time the packets need in the network layer should be something <
7.5 ms until 1 MB is transferred. This is the additional time which
hurts you when you are connected with DRBD.

Please do a blktrace on the iSCSI client for disconnected and connected
DRBD primary. Then, we'll see which of the two chained network
connections is faster.

This is what we know for 1 MB, now:
iSCSI path: ? ms
SCSI/disk: 1.33 ms
between DRBD nodes: 7.53 ms

Overall latency = iSCSI path + SCSI/disk + between DRBD nodes.

Cheers,
Sebastian