[DRBD-user] Out of memory error when invoking fence-handler

Lars Ellenberg lars.ellenberg at linbit.com
Mon Nov 10 15:16:12 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Mon, Nov 10, 2014 at 09:00:45AM -0500, Digimer wrote:
> On 10/11/14 04:11 AM, Lars Ellenberg wrote:
> >On Sun, Nov 09, 2014 at 04:05:52PM -0500, Digimer wrote:
> >>CentOS 6.6, DRBD 8.3.16.
> >>
> >>So this sucked:
> >>
> >>After rebooting and restoring, I retried and got the same result a
> >>second time. After moving my VMs to the other node, I tested
> >>crashing the other node and again saw the "out of mem, failed to
> >>invoke fence-peer helper" message. After that, I rebooted both
> >>nodes. I've not yet tested if that resolved the issue.
> >>
> >>Anyone seen this before?
> >
> >>*** Nov  9 15:18:40 fea-c01n01 kernel: block drbd0: out of mem,
> >>failed to invoke fence-peer helper
> >
> >Sure.
> >
> >Your kernel is too new for this DRBD.
> >Your DRBD is too old for this kernel.
> >
> >
> >As you know, we sometimes start some "handlers".
> >We spawn new kernel threads for this.
> >
> >One of the relevant functions is kthread_run
> >(and everything it calles).
> >
> >That used to fail only for hard out of memory conditions.
> >(Thus the "nonsense" error message)
> >
> >At some point, upstream kernel changed the internals
> >of that code path to no longer do a wait_for_completion(),
> >but to do a wait_for_completion_killable().
> >
> >>Nov  9 15:21:16 fea-c01n01 kernel:      Not tainted 2.6.32-504.el6.x86_64 #1
> >
> >And apparently RHEL 6.6. has backported that change.
> >
> >Which means that now this can also fail because of pending signals.
> >DRBD routinely may have a signal pending in the calling thread there.
> >
> >Upstream fix:
> >http://git.linbit.com/gitweb.cgi?p=drbd-8.4.git;a=commitdiff;h=e998365475194a8faf31a86081e88034d7bd1a41
> 
> Another list user emailed me off list pointing to that fix as well.
> Problem is, it doesn't match the 8.3.16 source I have...

So?
There is *exactly* one occurrence of kthread_run in the drbd source,
and the patch consists of *exactly* one non-comment line,
which is "+       flush_signals(current);"

 ;-)

Besides: time to finally get rid of 8.3, then.

> ===
> void drbd_try_outdate_peer_async(struct drbd_conf *mdev)
> {
>         struct task_struct *opa;
> 
>         opa = kthread_run(_try_outdate_peer_async, mdev,
> "drbd%d_a_helper", mdev_to_minor(mdev));
>         if (IS_ERR(opa))
>                 dev_err(DEV, "out of mem, failed to invoke
> fence-peer helper\n");
> }
> ===
> 
> Can I simply add the two missing lines?:
> 
> ===
> void drbd_try_outdate_peer_async(struct drbd_conf *mdev)
> {
>         struct task_struct *opa;
> 
>         kref_get(&connection->kref);

Don't add a kref get; that kref does not exist in 8.3 code.
(It is also only _context_ line in the patch)

>         flush_signals(current);
>         opa = kthread_run(_try_outdate_peer_async, mdev,
> "drbd%d_a_helper", mdev_to_minor(mdev));
>         if (IS_ERR(opa))
>                 dev_err(DEV, "out of mem, failed to invoke
> fence-peer helper\n");
> }


-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list