[DRBD-user] DRBD crash with bad network

Lars Ellenberg lars.ellenberg at linbit.com
Wed Mar 31 21:21:45 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Mar 30, 2010 at 10:34:06AM +0200, Maxence DUNNEWIND wrote:
> Hi, 
> 
> I have a cluster of 10 servers with many drbd devices. The drbd version is
> 8.3.7, module loaded with :
> drbd minor_count=128 usermode_helper=/bin/true
> (because I use it with ganeti).
> 
> I have about 40 drbd devices per node (primary and secondaries). Our provider
> has lot of network issues, which sometimes cause drbd to disconnect/reconnect
> very often : about 500 NetworkFailure in 1 hour before the last crash :
> # grep "Connected -> NetworkFailure" /var/log/messages|grep -c "Mar 30 00"
> 483

So you are using DRBD with ganeti in a cloud?
Which cloud?

> Then the crash log :

The most interessting line is before that.

> Mar 30 00:52:48 z2-6 kernel: [1685605.588315] CPU 2 

> Mar 30 00:52:48 z2-6 kernel: [1685605.589086] Pid: 21781, comm: drbd0_worker Tainted: G        W  2.6.30-2-amd64 #1 X8STi
> Mar 30 00:52:48 z2-6 kernel: [1685605.594280] RIP: 0010:[<ffffffff802bbc80>] [<ffffffff802bbc80>] cache_alloc_refill+0xf6/0x1f9

Hard out of memory?
did you google for "2.6.30 cache_alloc_refill",
and checked that you are not affected by any of those?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list