Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 10/11/14 09:16 AM, Lars Ellenberg wrote: > On Mon, Nov 10, 2014 at 09:00:45AM -0500, Digimer wrote: >> On 10/11/14 04:11 AM, Lars Ellenberg wrote: >>> On Sun, Nov 09, 2014 at 04:05:52PM -0500, Digimer wrote: >>>> CentOS 6.6, DRBD 8.3.16. >>>> >>>> So this sucked: >>>> >>>> After rebooting and restoring, I retried and got the same result a >>>> second time. After moving my VMs to the other node, I tested >>>> crashing the other node and again saw the "out of mem, failed to >>>> invoke fence-peer helper" message. After that, I rebooted both >>>> nodes. I've not yet tested if that resolved the issue. >>>> >>>> Anyone seen this before? >>> >>>> *** Nov 9 15:18:40 fea-c01n01 kernel: block drbd0: out of mem, >>>> failed to invoke fence-peer helper >>> >>> Sure. >>> >>> Your kernel is too new for this DRBD. >>> Your DRBD is too old for this kernel. >>> >>> >>> As you know, we sometimes start some "handlers". >>> We spawn new kernel threads for this. >>> >>> One of the relevant functions is kthread_run >>> (and everything it calles). >>> >>> That used to fail only for hard out of memory conditions. >>> (Thus the "nonsense" error message) >>> >>> At some point, upstream kernel changed the internals >>> of that code path to no longer do a wait_for_completion(), >>> but to do a wait_for_completion_killable(). >>> >>>> Nov 9 15:21:16 fea-c01n01 kernel: Not tainted 2.6.32-504.el6.x86_64 #1 >>> >>> And apparently RHEL 6.6. has backported that change. >>> >>> Which means that now this can also fail because of pending signals. >>> DRBD routinely may have a signal pending in the calling thread there. >>> >>> Upstream fix: >>> http://git.linbit.com/gitweb.cgi?p=drbd-8.4.git;a=commitdiff;h=e998365475194a8faf31a86081e88034d7bd1a41 >> >> Another list user emailed me off list pointing to that fix as well. >> Problem is, it doesn't match the 8.3.16 source I have... > > So? > There is *exactly* one occurrence of kthread_run in the drbd source, > and the patch consists of *exactly* one non-comment line, > which is "+ flush_signals(current);" > > ;-) > > Besides: time to finally get rid of 8.3, then. > >> === >> void drbd_try_outdate_peer_async(struct drbd_conf *mdev) >> { >> struct task_struct *opa; >> >> opa = kthread_run(_try_outdate_peer_async, mdev, >> "drbd%d_a_helper", mdev_to_minor(mdev)); >> if (IS_ERR(opa)) >> dev_err(DEV, "out of mem, failed to invoke >> fence-peer helper\n"); >> } >> === >> >> Can I simply add the two missing lines?: >> >> === >> void drbd_try_outdate_peer_async(struct drbd_conf *mdev) >> { >> struct task_struct *opa; >> >> kref_get(&connection->kref); > > Don't add a kref get; that kref does not exist in 8.3 code. > (It is also only _context_ line in the patch) > >> flush_signals(current); >> opa = kthread_run(_try_outdate_peer_async, mdev, >> "drbd%d_a_helper", mdev_to_minor(mdev)); >> if (IS_ERR(opa)) >> dev_err(DEV, "out of mem, failed to invoke >> fence-peer helper\n"); >> } I knew it was context only, but it didn't match what was there so I wanted to clarify. So to summarize, I only add: ==== flush_signals(current); ==== I'll brush off my old RPM notes and see if I can sort out a patch. Thanks! -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?