Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Mar 30, 2010 at 10:34:06AM +0200, Maxence DUNNEWIND wrote: > Hi, > > I have a cluster of 10 servers with many drbd devices. The drbd version is > 8.3.7, module loaded with : > drbd minor_count=128 usermode_helper=/bin/true > (because I use it with ganeti). > > I have about 40 drbd devices per node (primary and secondaries). Our provider > has lot of network issues, which sometimes cause drbd to disconnect/reconnect > very often : about 500 NetworkFailure in 1 hour before the last crash : > # grep "Connected -> NetworkFailure" /var/log/messages|grep -c "Mar 30 00" > 483 So you are using DRBD with ganeti in a cloud? Which cloud? > Then the crash log : The most interessting line is before that. > Mar 30 00:52:48 z2-6 kernel: [1685605.588315] CPU 2 > Mar 30 00:52:48 z2-6 kernel: [1685605.589086] Pid: 21781, comm: drbd0_worker Tainted: G W 2.6.30-2-amd64 #1 X8STi > Mar 30 00:52:48 z2-6 kernel: [1685605.594280] RIP: 0010:[<ffffffff802bbc80>] [<ffffffff802bbc80>] cache_alloc_refill+0xf6/0x1f9 Hard out of memory? did you google for "2.6.30 cache_alloc_refill", and checked that you are not affected by any of those? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed