[DRBD-user] drbd on top of md: high load on primary during secondary resync

Kaloyan Kovachev kkovachev at varna.net
Tue Feb 14 15:50:59 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

On Tue, 14 Feb 2012 13:28:10 +0100, Walter Haidinger
<walter.haidinger at gmx.at> wrote:
> Hi!
> 
> Coincidentally another problem of drbd on top of md.
> 
> I'm running drbd 8.3.12 on top of a 3-way software-raid1
> (kernel 2.6.35.14, CentOS 5/x86_64) in an active/passive
> heartbeat cluster.
> 
> After replacing a failed disk on the secondary node and
> starting the resync, I noticed that the load on the
> primary went up from usually <0,5 to over 15, climbing
> to 20. The primary was also barely accessible (ssh slow to
> respond, nfs exports stalled) despite being mostly idle.
> Stopping the resync on the secondary or disconnecting drbd
> made the load drop again.
> 
> I "resolved" the issue by disconnecting and reconnecting
> after the completed raid resync. Needless to say, pretty ugly.
> 
> Before I post all the gory details (drbd/md config, etc):
> Is this known behavior? Why is a raid resync on the _secondary_
> node affecting the _primary_ at all? Odd.
> Maybe an issue with the 2.6.35 kernel?
> 

I am using different (from yours) versions of drbd and kernel with similar
symptoms, so it is not a bug and even expected behavior. By default raid is
rebuild at the maximum possible speed and throttled down when there is IO,
while drbd also tries to resync at its configured speed ... the load on the
disks in this case may rise above their limits.

I have limited the resync speed for both drbd and the raid in general to
some lower value than the disks are capable (and leave some for the apps
too). Because in my case, drbd takes (much) less time to resync, when
necessary i manually set the raid resync to the minimum possible value (may
even try with 0 as speed_max). Another option is to pause-sync /
resume-sync of drbd instead of disconnecting. With the above the load is
still reaching 5-10, but at least the system stays responsive.


> The cluster is productive but I'd be able to try some
> stuff during after hours, especially on the secondary.
> 
> If you need specific configuration information, 
> please just let me know.
> 
> Regards, 
> Walter
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list