[Drbd-dev] drbd 8.4.3: refcounter overflow on re-sync

Wed Sep 24 20:07:11 CEST 2014

On 24 Sep 2014 at 18:31, Lars Ellenberg wrote:

> On Wed, Sep 24, 2014 at 05:57:42PM +0200, PaX Team wrote:
> > so in short: this is not for debugging, this doesn't replace one bug
> > with another, but it does prevent real life exploitation of refcount
> > overflow bugs.
> 
> It won't make things "work".  It probably makes things crash in less
> obscure ways, though, I give you that.

not sure what you mean by making things 'work' but the goal of exploit
prevention is to prevent the exploitation of (even unknown) bugs, not to
find or fix them (of course these are (desirable) sideeffects once an
attacker is unfortunate enough to try his exploit on a protected system.
see more below about the reaction part.

> > perhaps it's a consequence of the reaction from the kernel on the overflow
> > which is equivalent to a SIGKILL with all that it implies (files and network
> > connections get closed, etc).
> 
> That would be the result of the _ASM_EXTABLE()?
> or what causes that "reaction"?

no, the extable mechanism is only used to re-enter the kernel in a known
way to be able to report back on the detected refcount overflow. the actual
reaction is in pax_report_refcount_overflow (you'll need a grsec or PaX tree
to see its body, it's not in the upstream kernel). it basically logs details
about the overflow (registers, process info, etc) then forces a SIGKILL into
the task.

you can see its output in the original report in this thread in fact, this
is what enabled me to figure out which atomic variable was involved and start
a discussion about this case (FYI, i've since turned both variables into the
'unchecked' type).

> As the process in question in *this* case is a drbd kernel thread, it
> does not much care about that KILL. It notices, clears it, and lives on.

grsecurity handles kernel tasks too via gr_handle_kernel_exploit but for
the refcount overflow detection we specifically chose to ignore them for
two reasons. first, in the typical exploit scenario of these kinds of bugs
it's a userland process in whose context the refcount overflow triggers.

second, since this is an early detection (i.e., before any damage could
have been done by an attack), the kernel state isn't corrupted yet and is
thus recoverable, so it's not urgent to halt the system (which is otherwise
necessary when unrecoverable state change occurs, think various forms of
memory corruption, etc).

> But how would KILL'ing an innocent userland process improve the overall
> situation?  Being a user land process, it cannot possibly be blamed for
> an in-kernel counter overflow, so why even kill it?

notwithstanding the very few false positives that arise due to our 'secure
by default' choice in handling atomic_t accessors (i actually blame the
kernel's lack of a proper abstraction layer on top atomic_t ;), an exploit
is anything but an innocent userland process and the proper way to handle
it is to kill it and also ban the user account (all this is a configurable
choice in grsecurity).

cheers,
  PaX Team