[DRBD-user] Out of memory error when invoking fence-handler

Mon Nov 10 10:11:02 CET 2014

On Sun, Nov 09, 2014 at 04:05:52PM -0500, Digimer wrote:
> CentOS 6.6, DRBD 8.3.16.
> 
> So this sucked:
> 
> After rebooting and restoring, I retried and got the same result a
> second time. After moving my VMs to the other node, I tested
> crashing the other node and again saw the "out of mem, failed to
> invoke fence-peer helper" message. After that, I rebooted both
> nodes. I've not yet tested if that resolved the issue.
> 
> Anyone seen this before?

> *** Nov  9 15:18:40 fea-c01n01 kernel: block drbd0: out of mem,
> failed to invoke fence-peer helper

Sure.

Your kernel is too new for this DRBD.
Your DRBD is too old for this kernel.

As you know, we sometimes start some "handlers".
We spawn new kernel threads for this.

One of the relevant functions is kthread_run
(and everything it calles).

That used to fail only for hard out of memory conditions.
(Thus the "nonsense" error message)

At some point, upstream kernel changed the internals
of that code path to no longer do a wait_for_completion(),
but to do a wait_for_completion_killable().

> Nov  9 15:21:16 fea-c01n01 kernel:      Not tainted 2.6.32-504.el6.x86_64 #1

And apparently RHEL 6.6. has backported that change.

Which means that now this can also fail because of pending signals.
DRBD routinely may have a signal pending in the calling thread there.

Upstream fix:
http://git.linbit.com/gitweb.cgi?p=drbd-8.4.git;a=commitdiff;h=e998365475194a8faf31a86081e88034d7bd1a41

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed