[Drbd-dev] drbd 8.4.3: refcounter overflow on re-sync

PaX Team pageexec at freemail.hu
Wed Sep 24 17:57:42 CEST 2014

On 24 Sep 2014 at 14:50, Lars Ellenberg wrote:

> So what PAX really is doing is redefine "atomic_add" and similar to
> basically become a no-op, if it would overflow.


> typedef struct { int counter } atomic_t;
> void atomic_add(int i, atomic_t *v)
> {
> 	v->counter += i;
> 	if (that_caused_a_counter_wrap_in_any_direction) {
> 		/* oops, overflow */
> 		SCREAM("help me, overflow...");
> 		v->counter -= i;

first, to reduce the window of exploitation, the refcount operation
is undone *before* the reaction (which is more than just a log btw).
second, on many archs (pretty much all but x86) the operation isn't
even committed to memory (hence doesn't need to be undone).

> 	}
> }
> If that *is* really an object refcount,
> and somewhere would be
> 	if (atomic_dec_and_test(that_count))
> 		free(some_object);
> then ok, you have replace one bug
> with an error message and different bug.
> Might help with debugging.  Not with much else. 

not correct ;). first of all, some stats based on 3.16.3 (not exact
counts, e.g., only a few archs are included as that's what i had in
my gtags database):

func          call-count
           normal   unchecked
inc          2100     300
add           180      45
add_return     80      35
kref_get      380     n/a
long_inc       60      50
long_add       30      20

so the overwhelming majority of atomic_t users is refcounts in fact,
not stats and other exceptional cases where overflow is normal and

second, for the refcount case any code path which leaves the refcount
unbalanced tends to be an exploitable security bug since it directly
ends up in a use-after-free situation when the refcount wraps around
0 and back (or whatever the trigger value for freeing is, 0 is typical).

stopping the refcount halfway means that such bugs are not exploitable.
the price you pay is a memory leak but you already pay that without the
overflow protection *and* get a nice root hole at the same time. clearly
this is not a debugging aid but a very efficient runtime exploit prevention
measure. in fact in practice this protection triggers on actual attacks
only, the refcount leaking paths don't get executed nearly enough (we
had some false positives that were only detected after some 200 days of
uptime, unfortunately finding them by static analysis is a hard problem).

so in short: this is not for debugging, this doesn't replace one bug
with another, but it does prevent real life exploitation of refcount
overflow bugs.

> But really.
> Precautionary changing (x + y) to be silently identical to (x + 0),
> "just in case", will surely generally improve program flow...  D'oh.

'just in case' you didn't know, signed overflow is undefined behaviour
in C.

> Anyways, now that I know PAX is really just keeping that counter
> at a fixed value of INT_MAX in this case, and nothing else,
> what would have caused DRBD to disconnect/reconnect?

perhaps it's a consequence of the reaction from the kernel on the overflow
which is equivalent to a SIGKILL with all that it implies (files and network
connections get closed, etc).

  PaX Team

More information about the drbd-dev mailing list