[DRBD-user] DRBD crash with bad network

Maxence DUNNEWIND maxence at dunnewind.net
Thu Apr 1 10:07:00 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> > I have about 40 drbd devices per node (primary and secondaries). Our provider
> > has lot of network issues, which sometimes cause drbd to disconnect/reconnect
> > very often : about 500 NetworkFailure in 1 hour before the last crash :
> > # grep "Connected -> NetworkFailure" /var/log/messages|grep -c "Mar 30 00"
> > 483
> 
> So you are using DRBD with ganeti in a cloud?
> Which cloud?
what do you mean by which cloud ? 
> The most interessting line is before that.
> 
> > Mar 30 00:52:48 z2-6 kernel: [1685605.588315] CPU 2 
> 
> > Mar 30 00:52:48 z2-6 kernel: [1685605.589086] Pid: 21781, comm: drbd0_worker Tainted: G        W  2.6.30-2-amd64 #1 X8STi
> > Mar 30 00:52:48 z2-6 kernel: [1685605.594280] RIP: 0010:[<ffffffff802bbc80>] [<ffffffff802bbc80>] cache_alloc_refill+0xf6/0x1f9
> 
> Hard out of memory?
> did you google for "2.6.30 cache_alloc_refill",
> and checked that you are not affected by any of those?

Yep, but there is not lot of things. We may suppose that, because of the lot of
NetworkFailure / Reconnection, the system do not flush memory fast enough so
that, when the network/drbd driver asks for memory, it fails, and the driver
deactivates itself (especially if we are in some special context, like IRQ) ?

Maxence
-- 
Maxence DUNNEWIND
Contact : maxence at dunnewind.net
Site : http://www.dunnewind.net
GPG : 18AE 61E4 D0B0 1C7C AAC9  E40D 4D39 68DB 0D2E B533
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100401/416a7d53/attachment.pgp>


More information about the drbd-user mailing list