Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 10/11/14 04:11 AM, Lars Ellenberg wrote: > On Sun, Nov 09, 2014 at 04:05:52PM -0500, Digimer wrote: >> CentOS 6.6, DRBD 8.3.16. >> >> So this sucked: >> >> After rebooting and restoring, I retried and got the same result a >> second time. After moving my VMs to the other node, I tested >> crashing the other node and again saw the "out of mem, failed to >> invoke fence-peer helper" message. After that, I rebooted both >> nodes. I've not yet tested if that resolved the issue. >> >> Anyone seen this before? > >> *** Nov 9 15:18:40 fea-c01n01 kernel: block drbd0: out of mem, >> failed to invoke fence-peer helper > > Sure. > > Your kernel is too new for this DRBD. > Your DRBD is too old for this kernel. > > > As you know, we sometimes start some "handlers". > We spawn new kernel threads for this. > > One of the relevant functions is kthread_run > (and everything it calles). > > That used to fail only for hard out of memory conditions. > (Thus the "nonsense" error message) > > At some point, upstream kernel changed the internals > of that code path to no longer do a wait_for_completion(), > but to do a wait_for_completion_killable(). > >> Nov 9 15:21:16 fea-c01n01 kernel: Not tainted 2.6.32-504.el6.x86_64 #1 > > And apparently RHEL 6.6. has backported that change. > > Which means that now this can also fail because of pending signals. > DRBD routinely may have a signal pending in the calling thread there. > > Upstream fix: > http://git.linbit.com/gitweb.cgi?p=drbd-8.4.git;a=commitdiff;h=e998365475194a8faf31a86081e88034d7bd1a41 Another list user emailed me off list pointing to that fix as well. Problem is, it doesn't match the 8.3.16 source I have... === void drbd_try_outdate_peer_async(struct drbd_conf *mdev) { struct task_struct *opa; opa = kthread_run(_try_outdate_peer_async, mdev, "drbd%d_a_helper", mdev_to_minor(mdev)); if (IS_ERR(opa)) dev_err(DEV, "out of mem, failed to invoke fence-peer helper\n"); } === Can I simply add the two missing lines?: === void drbd_try_outdate_peer_async(struct drbd_conf *mdev) { struct task_struct *opa; kref_get(&connection->kref); flush_signals(current); opa = kthread_run(_try_outdate_peer_async, mdev, "drbd%d_a_helper", mdev_to_minor(mdev)); if (IS_ERR(opa)) dev_err(DEV, "out of mem, failed to invoke fence-peer helper\n"); } === Thanks -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?