[DRBD-user] Out of memory error when invoking fence-handler

Digimer lists at alteeve.ca
Mon Nov 10 15:00:45 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 10/11/14 04:11 AM, Lars Ellenberg wrote:
> On Sun, Nov 09, 2014 at 04:05:52PM -0500, Digimer wrote:
>> CentOS 6.6, DRBD 8.3.16.
>>
>> So this sucked:
>>
>> After rebooting and restoring, I retried and got the same result a
>> second time. After moving my VMs to the other node, I tested
>> crashing the other node and again saw the "out of mem, failed to
>> invoke fence-peer helper" message. After that, I rebooted both
>> nodes. I've not yet tested if that resolved the issue.
>>
>> Anyone seen this before?
>
>> *** Nov  9 15:18:40 fea-c01n01 kernel: block drbd0: out of mem,
>> failed to invoke fence-peer helper
>
> Sure.
>
> Your kernel is too new for this DRBD.
> Your DRBD is too old for this kernel.
>
>
> As you know, we sometimes start some "handlers".
> We spawn new kernel threads for this.
>
> One of the relevant functions is kthread_run
> (and everything it calles).
>
> That used to fail only for hard out of memory conditions.
> (Thus the "nonsense" error message)
>
> At some point, upstream kernel changed the internals
> of that code path to no longer do a wait_for_completion(),
> but to do a wait_for_completion_killable().
>
>> Nov  9 15:21:16 fea-c01n01 kernel:      Not tainted 2.6.32-504.el6.x86_64 #1
>
> And apparently RHEL 6.6. has backported that change.
>
> Which means that now this can also fail because of pending signals.
> DRBD routinely may have a signal pending in the calling thread there.
>
> Upstream fix:
> http://git.linbit.com/gitweb.cgi?p=drbd-8.4.git;a=commitdiff;h=e998365475194a8faf31a86081e88034d7bd1a41

Another list user emailed me off list pointing to that fix as well. 
Problem is, it doesn't match the 8.3.16 source I have...

===
void drbd_try_outdate_peer_async(struct drbd_conf *mdev)
{
         struct task_struct *opa;

         opa = kthread_run(_try_outdate_peer_async, mdev, 
"drbd%d_a_helper", mdev_to_minor(mdev));
         if (IS_ERR(opa))
                 dev_err(DEV, "out of mem, failed to invoke fence-peer 
helper\n");
}
===

Can I simply add the two missing lines?:

===
void drbd_try_outdate_peer_async(struct drbd_conf *mdev)
{
         struct task_struct *opa;

         kref_get(&connection->kref);
         flush_signals(current);
         opa = kthread_run(_try_outdate_peer_async, mdev, 
"drbd%d_a_helper", mdev_to_minor(mdev));
         if (IS_ERR(opa))
                 dev_err(DEV, "out of mem, failed to invoke fence-peer 
helper\n");
}
===

Thanks

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



More information about the drbd-user mailing list