[DRBD-user] Dual-primary/ Very slow synchronization

Lars Ellenberg lars.ellenberg at linbit.com
Fri May 20 10:22:48 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, May 20, 2011 at 10:12:29AM +0200, Daniel Meszaros wrote:
> Hi!
> 
> Am 19.05.2011 21:16, schrieb Digimer:
> >As Felix stated, try 10M. If it get's up to that speed (and it can take
> >a while, be patient), then bump it to 20M, etc.
> 
> I tried syncing with 10M last night and indeed it became faster than
> before and was around 10MB/s. Using "drbdsetup /dev/drbd0 syncer -r
> 20M" and so on I could increase the sync speed up to 70M ... then it
> interrupted and a new bitmap check started.
> 
> While doing so I recognized some kernel error message in my virtual
> guests of the Xenserver:
> 
> Linux:
> [ 2040.064272] INFO: task exim4:2336 blocked for more than 120 seconds.
> [ 2040.064281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> 
> Windows 2003 SBS:
> NTDS (432) NTDSA: Eine Anforderung, von der Datei
> "C:\WINDOWS\NTDS\ntds.dit" ab Offset 4661248 (0x0000000000472000)
> insgesamt 8192 (0x00002000) Bytes zu lesen, war erfolgreich,
> benötigte aber ungewöhnlich viel Zeit (406 Sekunden) von Seiten des
> Betriebssystems. Zusätzlich haben 0 andere E/A-Anforderungen an
> diese Datei ungewöhnlich viel Zeit benötigt, seit die letzte Meldung
> bezüglich dieses Problems vor 539 Sekunden gesendet wurde. Dieses
> Problem ist vermutlich durch fehlerhafte Hardware bedingt. Wenden
> Sie sich für weitere Unterstützung bei der Diagnose des Problems an
> Ihren Hardwarehersteller.
> 
> While this is happening these machines are not available in the
> network. Therefore I stopped the synchronization and shut down the
> machine that is out of sync ... which led back to normally working
> services.
> 
> For any reason the DRBD sync appears to take too much I/O
> performance, even if it is running at 10M. I must admit that the
> last time I remember having done a full sync was before I had these
> machines set up and running.
> 
> When asking Google I found emails from this list I found some
> message mentioning "scst vdisk_blockio" to be changed to
> "vdisk_fileio", however I do not use SCST.
> 
> Any ideas on non-SCST systems? :-/

According to you, it "had been working" before with the exact same
hardware and configuration.

Now if it does not anymore, then I strongly suspect network problems.

Did you do some network benchmarks on the replication link recently?
Packet loss, excessive retransmits, checksum errors, bad cabling?
Port stats on the switch, any error counters?
Duplex or other auto-negotiating mismatch?

Use flood ping with both large (>32k) and small packet sizes, iperf, 
your favorite network integrity checker or benchmarking tool...


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list