[DRBD-user] High iowait on primary DRBD node with large sustained writes and replication enabled to secondary

Paul Freeman paul.freeman at emlair.com.au
Fri Jan 11 04:36:25 CET 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.



> -----Original Message-----
> From: Sebastian Riemer [mailto:sebastian.riemer at profitbricks.com]
> Sent: Wednesday, 9 January 2013 9:35 PM
> To: Paul Freeman
> Cc: drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] High iowait on primary DRBD node with large
> sustained writes and replication enabled to secondary
> 
> On 09.01.2013 04:51, Paul Freeman wrote:
> ...
> > Analysis:
> 
> Hmm, tcpdump could help. You'll get a time stamp for the outgoing
> packets and the incoming completions. Included is the latency of DRBD
> and the storage on the receiver side. But at least you don't see the
> latency of DRBD on the sender side. So with that you could see if there
> is a bigger issue before the network layer.
> 
Sebastian,
I have collected some tcpdump logs on the internode link used for drbd while writing 100M to the primary drbd node using dd (as before) and with the relevant drbd resource connected.

I am not very proficient at interpreting the tcpdump traffic so please bear with me:-)

I ran tcpdump with the options "-vvv -s0 tcp port 7798".  Port 7798 is the port used by this drbd resource.  I subsequently used wireshark to identify the tcp stream associated with the replication traffic and set an appropriate filter to extract those packets and save them to a separate packet capture file.

My first guess is the segment ACK RTT may be relevant to our investigation.  When I extract the segment ACKs from the secondary node using tshark and tcp.analysis.ack_rtt then process them in MS Excel, the results are as follows:

Number of segment ACKs:	5255
Average RTT: 1.87ms
Minimum RTT: 0.078ms
Maximum RTT: 41.45ms (actually the last ACK)

The time for dd to complete was 950ms.

Does this provide any information on the latency problem?

I am not convinced I am looking at the appropriate component of the tcp traffic.  If not, can you please advise and I will re-analyse.  I can supply the tcpdump logs if required (off-list).

Thanks

Paul



More information about the drbd-user mailing list