Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Mon, Nov 10, 2014 at 09:00:45AM -0500, Digimer wrote: > On 10/11/14 04:11 AM, Lars Ellenberg wrote: > >On Sun, Nov 09, 2014 at 04:05:52PM -0500, Digimer wrote: > >>CentOS 6.6, DRBD 8.3.16. > >> > >>So this sucked: > >> > >>After rebooting and restoring, I retried and got the same result a > >>second time. After moving my VMs to the other node, I tested > >>crashing the other node and again saw the "out of mem, failed to > >>invoke fence-peer helper" message. After that, I rebooted both > >>nodes. I've not yet tested if that resolved the issue. > >> > >>Anyone seen this before? > > > >>*** Nov 9 15:18:40 fea-c01n01 kernel: block drbd0: out of mem, > >>failed to invoke fence-peer helper > > > >Sure. > > > >Your kernel is too new for this DRBD. > >Your DRBD is too old for this kernel. > > > > > >As you know, we sometimes start some "handlers". > >We spawn new kernel threads for this. > > > >One of the relevant functions is kthread_run > >(and everything it calles). > > > >That used to fail only for hard out of memory conditions. > >(Thus the "nonsense" error message) > > > >At some point, upstream kernel changed the internals > >of that code path to no longer do a wait_for_completion(), > >but to do a wait_for_completion_killable(). > > > >>Nov 9 15:21:16 fea-c01n01 kernel: Not tainted 2.6.32-504.el6.x86_64 #1 > > > >And apparently RHEL 6.6. has backported that change. > > > >Which means that now this can also fail because of pending signals. > >DRBD routinely may have a signal pending in the calling thread there. > > > >Upstream fix: > >http://git.linbit.com/gitweb.cgi?p=drbd-8.4.git;a=commitdiff;h=e998365475194a8faf31a86081e88034d7bd1a41 > > Another list user emailed me off list pointing to that fix as well. > Problem is, it doesn't match the 8.3.16 source I have... So? There is *exactly* one occurrence of kthread_run in the drbd source, and the patch consists of *exactly* one non-comment line, which is "+ flush_signals(current);" ;-) Besides: time to finally get rid of 8.3, then. > === > void drbd_try_outdate_peer_async(struct drbd_conf *mdev) > { > struct task_struct *opa; > > opa = kthread_run(_try_outdate_peer_async, mdev, > "drbd%d_a_helper", mdev_to_minor(mdev)); > if (IS_ERR(opa)) > dev_err(DEV, "out of mem, failed to invoke > fence-peer helper\n"); > } > === > > Can I simply add the two missing lines?: > > === > void drbd_try_outdate_peer_async(struct drbd_conf *mdev) > { > struct task_struct *opa; > > kref_get(&connection->kref); Don't add a kref get; that kref does not exist in 8.3 code. (It is also only _context_ line in the patch) > flush_signals(current); > opa = kthread_run(_try_outdate_peer_async, mdev, > "drbd%d_a_helper", mdev_to_minor(mdev)); > if (IS_ERR(opa)) > dev_err(DEV, "out of mem, failed to invoke > fence-peer helper\n"); > } -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed