[DRBD-user] NULL deref at drbd_submit_peer_request

Lars Ellenberg lars.ellenberg at linbit.com
Fri Feb 10 09:37:59 CET 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Feb 10, 2017 at 09:58:46AM +0800, Jasmin J. wrote:
> Hi!
> 
> >>>When running a kind of system test (detach/attach loop in high system load),
> >
> >"Don't do that, then." :-)
> >[wonders what real-world scenario that test is supposed to excercise]
> He does a TEST to find hidden bugs!

> Such tests are an appropriate method, beside others, to find design problems
> or implementation errors.

You don't say.

> Even if you think it is unlikely in a real world example, it may happen to
> what ever reason in real world anyway.

Uhm, no.

Let me phrase "Don't do that, then" differently.
Workaround exists: give the system room to breathe,
before you try to re-attach after a detach.

> Think of thousands of installations
> and of millions of hours your DRBD driver is used over the world. It then
> becomes less unlikely, because of simple statistics.

Nope.

> I even have to cope with bit flips due to natural cosmic radiation in
> my daily business (mission critical systems for aero-space). They are
> also very unlikely.

Yes.  But that's a different class of problem.
Something that "just happens", and you have to deal with it.

His trigger for the bug is an admin action,
which he has absolute control over.
Or so I would hope.

> And when you look to his next eMail, he found a possible not symmetrical
> execution of ref-count counting. That is a very common error in Linux Kernel
> development and can be avoided by strictly symmetrical function executions
> by design. But they are hard to debug and a test he did is a good method to
> find such errors.

You are dealing out wise insights by the dozen by default, don't you. ;-)

Sure. There is a bug there somewhere, when "simultaneously" detaching
and attaching while the system is still busy "recovering" from the prior
detach.  We'll eventually even look into that and fix it.

We have to prioritize somehow, though.
And spending time on debugging something in a path which can easily be
avoided (by simply not doing it; d'oh) won't get the high score.

Cheers,
    Lars

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list