[DRBD-user] Problems with applications on top of DRBD locking up, becoming defunct

Lavender, Ben ben.lavender at mcdean-europe.com
Tue Sep 4 13:21:56 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


This seems to be solved, so just to put this on the mailing list, since
I'm sure other people have encountered these issues:

A lot of new Dell PowerEdge servers come with a TCP Offload Engine
associated with the onboard Broadcom Gigabit NICs.  Though it would seem
that simply not installing a driver for this would mean it does nothing
and there would be no problems, this is not at all the case.

Turning off the TOE requires one to open the case and remove a
dongle/widget/plugged-in thing with an RJ-11 style connector.  There's
no software switch on the boxes I have.

The TOE problems manifested themselves in other applications, such as
tcpdump, but nowhere were they as clear as DRBD.  Removing the physical
TOE bit seems to have ended my DRBD issues.  I was locking up every few
hours, and have been running now for a week without issues.  I even
recovered nicely from a split-brain after the switch went down during
some power testing, with no rebooting or module issues required.

(PS--allowing DRBD to send messages over a null modem cable along with
heartbeat to avoid a split brain on network device failure would be
swell.  Dedicated Gigabit NICs are expensive and I don't have ports to
spare on all of my machines).

Ben

> -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-
> bounces at lists.linbit.com] On Behalf Of Lavender, Ben
> Sent: Thursday, August 30, 2007 10:29 AM
> To: drbd-user at lists.linbit.com
> Subject: RE: [DRBD-user] Problems with applications on top of DRBD
locking
> up,becoming defunct
> 
> The co'd branch doesn't build for me on a machine that can build the
> 8.0.5 tgz.
> 
> Make works, but after removing the rpms from 8.0.5 and removing the
> module, whatever make install did seems to have lost them.  I'd be
more
> comfortable with make rpm, since I can then more easily play with
> versions; do I need to be adding something to the svn to let 'make
rpm'
> work?
> 
> I'm trying to verify if my kernel version uses the newer version of
> generic_make_request and I'm 90% sure it does not.
> 
> Ben
> 
> 
> 
> > -----Original Message-----
> > From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-
> > bounces at lists.linbit.com] On Behalf Of Lars Ellenberg
> > Sent: Wednesday, August 29, 2007 11:53 PM
> > To: drbd-user at lists.linbit.com
> > Subject: Re: [DRBD-user] Problems with applications on top of DRBD
> locking
> > up,becoming defunct
> >
> > On Wed, Aug 29, 2007 at 07:30:51PM +0200, Lavender, Ben wrote:
> > > I'm working on getting a few DRBD servers up and running, but I'm
> having
> > > a ton of trouble with it.  I've set up DRBD/heartbeat on some
> smaller
> > > servers before with no problems, but this current batch is
behaving
> > > strangely with no error messages reported anywhere.
> > >
> > > Basically, any application (so far pound, nfsd, mysql, and ldap,
> along
> > > with a number of filesystem tools) will eventually lock up while
> working
> > > with the mounted drbd volume.  It can happen with something as
> simple as
> > > 'ls', or 'touch'.  After whatever process locks up, the drbd
volume
> is
> > > considered busy and can't be released without fencing it.  After a
> > > reboot, this happens in less than 3 hours.  Killing whatever
process
> is
> > > having the problem will do nothing without -9; with -9 they will
> > > sometimes become defunct and sometimes do nothing.
> > >
> > > /proc/drbd shows nothing unusual.  This is an example after my
> second
> > > two drbd volumes have locked up, but I managed to unmount and
> secondary
> > > one of them.  The second one is locked with mysql and the third
with
> > > umount.
> > >
> > >
> > >
> > > [root at backend1 ben]# cat /proc/drbd
> > >
> > > version: 8.0.5 (api:86/proto:86)
> > ....
> >
> > > These are the same machines that prompted my 'WFBitMapT and
> WFBitMapS'
> > > message from earlier this week which has not seen a response,
> namely,
> > > Dell 2950's with hardware RAID (Perc5's on the megaraid driver),
> x86_64,
> > > RHEL5, SELinux enabled.  The DRBD volumes live on top of LVM
> partitions.
> > > Kernel is 2.6.18-8.1.8, DRBD is 8.0.5, all mentioned software is
> either
> > > the CentOS5 or RHEL5 version of the particular package.
> >
> > your working installations use a different kernel, I assume?
> > with this particulare kernel it locks up?
> >
> > vendor kernels are known to backport or introduce things early.
> > it may well be the same problem that shows in kernel.org 2.6.22 and
> later,
> > a lockup in recursive generic_make_request.
> >
> > please try again with a recent checkout of drbd from
> >  svn co http://svn.drbd.org/drbd/branches/drbd-8.0
> >
> > --
> > : Lars Ellenberg                            Tel +43-1-8178292-0  :
> > : LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
> > : Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
> > __
> > please use the "List-Reply" function of your email client.
> > _______________________________________________
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list