[DRBD-user] High iowait on primary DRBD node with large sustained writes and replication enabled to secondary

Paul Freeman paul.freeman at emlair.com.au
Mon Jan 7 22:51:33 CET 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Firstly I want to say thank you to the developers and maintainers of DRBD for a great application.  I have been using it in production for a couple of years and it has worked extremely well.

This is a lengthy post as I wanted to give a reasonable amount of detail describing my system and the problem.  I trust this is OK and you will persist with reading:-) 

Some background on my system:
I have an iSCSI storage system comprising two identical servers running a 3TB RAID10 (6 x 1TB enterprise grade SATA II discs with 3Ware 9690SA controller and BBU) which were running DRBD (v 8.3.7) over LVM in a primary-secondary configuration.  The host OS was Ubuntu 10.04 LTS and was using the default DRBD provided by that distribution.

The two servers were upgraded to Ubuntu 12.04 LTS last week which includes kernel 3.2.0-35 and DRBD 8.3.11.  The upgrade went smoothly and DRBD is using the same configuration files as I had created for 10.04 LTS.

The OS deadline scheduler is being used, not the default CFQ.

The iSCSI storage is configured with LVM to give three volume groups and ten logical volumes.  The storage is used for file server storage (a Windows 2008 R2 server) and virtual guest storage for a Proxmox v2.2 KVM environment.  The two servers have a private, dedicated bonded (round-robin) dual port nic (Intel Pro/1000 PT) connection for DRBD.

My problem:
During prolonged writes (approx. 3-6 minutes) from a virtual guest restore initiated from the Proxmox virtual host, iowait (and subsequently load average) increases on the Proxmox and primary iSCSI/DRBD servers to a point where SCSI timeouts occur both for the Proxmox server and any virtual guests running at the time of the restore.

Problem details:
I needed to restore some Proxmox KVM virtual guests from backups to their original logical volumes on the iSCSI storage.  The process involves decompressing the virtual guest backup on the Proxmox server then using dd to copy the image to a logical volume created by Proxmox on the iSCSI storage lun.

During this restore (an image of approx. 40GB) the iowait on the primary iSCSI server was initially low (<1%) but after approx. 15-30sec it climbs to about 75% (average over 8 cores) and stays there.  The load average also climbs and eventually the Proxmox host and virtual guests sharing the iSCSI storage started getting SCSI timeouts and locking up.

This behaviour is reproducible.

When not performing a restore from the Proxmox virtual host to the iSCSI/DRBD storage the system is performing very well.

Analysis of the problem:
I have spent some time investigating this to try and determine why iowait is so high in this scenario.  I have found the following.

	1. If the resource being used is connected to the resource on the secondary (normal primary-secondary DRBD config) then iowait climbs to approx . 75% after approx. 15-30sec.  The write speed from the Proxmox host to the primary iSCSI/DRBD node is approx. 75Mbytes/sec and the replication bond link is running at approx. 650Mbits/sec

      2. If I disconnect the particular resource on the primary iSCSI/DRBD node then high iowait does not occur at all.  Write performance from the Proxmox host to the primary ISCSI is basically wire speed (110-120MBytes/sec)

	3. If I then reconnect the resource, synchronization starts as expected and runs successfully (syncer set at 150M) with the bond running at approx. 1700Mbits/sec.

The results reveal that the DRBD layer is not causing much overhead when running in StandAlone mode.  However, when running in connected mode (Protocol C) something is going on which is causing high iowait.

In connected mode, even though the incoming network connection from the Proxmox server is approx. 920Mbits/sec the bonded network connection between the DRBD nodes is only running at approx 600Mbits/sec.  When it runs in sync mode (option 3 above) it runs at approx. 1800Mbits/sec.  

This may not actually be a DRBD problem but rather some other IO problem(s) or interaction(s) but I can't work out what at this point.

My hunch is the initial delay then increase in iowait is related to IO buffers filling somewhere in the OS/iSCSI/DRBD/network layers.

I am using a standard DBRD config and can supply details if required.  I have tried increasing max-buffers and max-epoch-size to 8000 and sndbuf-size to 0 (autotune) but these have not made much, if any impact).  I wanted to try and keep the posting as short as I could.

I have come across a few references of similar behaviour on the net but have not come across the solution(s) which appear relevant in my situation.

Any comments and suggestions would be welcome.

Regards

Paul



More information about the drbd-user mailing list