[DRBD-user] DRBD 8.0.13 SyncTarget crashing with alloc_ee: Allocation of a page failed

Lars Ellenberg lars.ellenberg at linbit.com
Tue Feb 3 11:07:01 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Feb 03, 2009 at 09:47:37AM +0100, Peter Luciak wrote:
> Hello all,
>
> I'm experiencing weird crashes with drbd 8.0.13 when trying to  
> resynchronize the secondary node. The secondary crashes (without any  
> oops-es or other information in /var/log/messages) after some random  
> period of resynchronization (around 20-30%).
>
> On the primary there is a 2.6.15.6 kernel and on the secondary I tried  
> upgrading to 2.6.26.8. Now the resync went OK, but when I tested it  
> again, it crashed again. This is a 64b kernel and the machine has  
> Adaptec AIC7902 Ultra320 SCSI adapter with 4 disks in software RAID1  
> configuration. Interestingly, this problem started to appear when we  
> replaced one disk in the RAID array.
>
> Another drbd-user thread which I had found suggests that this could be  
> related to Supermicro motherboards. Indeed, there is  SuperMicro X6DA8  
> G2 i7525 on the primary, but TYAN Thunder i7525 on the secondary (ie.  
> the one which crashes). I've tried to load default settings on the Tyan  
> board, but to no avail.
>
> Unfortunately, I don't have access to the servers physically, so I'm  
> trying to come up with a software solution (if possible :) Could
> an upgrade to drbd 8.3.x help in this case?

As I don't know what the problem is,
I cannot say whether it is fixed...

> Feb  3 10:45:41 vwsrv1 kernel: drbd1: alloc_ee: Allocation of a page failed

but, this message may be an indication to a memory problem.
not necessarily, though.
anyways, you can try again, serialising resync, reducing memory foot
print of in-flight requests:
  syncer { after $other_resource_name; };
  max-buffers $smaller_value;

> For completeness, logs from secondary:
> Feb  3 10:45:28 vwsrv2 kernel: Total HugeTLB memory allocated, 0

it would be nice to capture the actual reason of the "crash"...
serial console?

you could use some usb-to-serial adapter, and hook the server serial
console up to some nearby system's usb,
configure it to use the serial console, adjust the printk console level
and log whatever it spits out using gnu screen (or any other terminal program).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list