[DRBD-user] drbd on top of md: high load on primary during secondary resync

Tue Feb 14 13:28:10 CET 2012

Hi!

Coincidentally another problem of drbd on top of md.

I'm running drbd 8.3.12 on top of a 3-way software-raid1
(kernel 2.6.35.14, CentOS 5/x86_64) in an active/passive
heartbeat cluster.

After replacing a failed disk on the secondary node and
starting the resync, I noticed that the load on the
primary went up from usually <0,5 to over 15, climbing
to 20. The primary was also barely accessible (ssh slow to
respond, nfs exports stalled) despite being mostly idle.
Stopping the resync on the secondary or disconnecting drbd
made the load drop again.

I "resolved" the issue by disconnecting and reconnecting
after the completed raid resync. Needless to say, pretty ugly.

Before I post all the gory details (drbd/md config, etc):
Is this known behavior? Why is a raid resync on the _secondary_
node affecting the _primary_ at all? Odd.
Maybe an issue with the 2.6.35 kernel?

The cluster is productive but I'd be able to try some
stuff during after hours, especially on the secondary.

If you need specific configuration information, 
please just let me know.

Regards, 
Walter