[DRBD-user] DRBD crash with bad network

Lars Ellenberg lars.ellenberg at linbit.com
Thu Apr 1 13:33:44 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Apr 01, 2010 at 08:07:00AM +0000, Maxence DUNNEWIND wrote:
> > The most interessting line is before that.
> > 
> > > Mar 30 00:52:48 z2-6 kernel: [1685605.588315] CPU 2 
> > 
> > > Mar 30 00:52:48 z2-6 kernel: [1685605.589086] Pid: 21781, comm: drbd0_worker Tainted: G        W  2.6.30-2-amd64 #1 X8STi
> > > Mar 30 00:52:48 z2-6 kernel: [1685605.594280] RIP: 0010:[<ffffffff802bbc80>] [<ffffffff802bbc80>] cache_alloc_refill+0xf6/0x1f9
> > 
> > Hard out of memory?
> > did you google for "2.6.30 cache_alloc_refill",
> > and checked that you are not affected by any of those?
> 
> Yep, but there is not lot of things. We may suppose that, because of the lot of
> NetworkFailure / Reconnection, the system do not flush memory fast enough so
> that, when the network/drbd driver asks for memory, it fails, and the driver
> deactivates itself (especially if we are in some special context, like IRQ) ?

Nothing "deactivates itself".
cache_alloc_refill does not get memory or detects slab corruption, or
both, which triggers a BUG_ON().
The logs before what you quoted should tell you which one.

You may need to do some math on the actual memory requirements,
taking into account drbd max-buffers, tcp_mem, page cache,
dirty pages, and tune things up or down in the drbd configuration
as well as various vm and network related sysctls.

You should have some more interessting logs before that,
which should help you in "guessing" what needs to be done.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list