[DRBD-user] 8.3.5 Stalling on sync FIXED

James Larcombe jim at roadtech.co.uk
Thu Dec 3 14:16:28 CET 2009



Hi all,

Thanks to Morey and Mike for their advice.

DRBD now syncing correctly without any stalling. Upgrading the firmware and
matching driver from HP worked a treat.

Thanks

Jim

-----Original Message-----
From: drbd-user-bounces at lists.linbit.com
[mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of
drbd-user-request at lists.linbit.com
Sent: 25 November 2009 21:44
To: drbd-user at lists.linbit.com
Subject: drbd-user Digest, Vol 64, Issue 42


Send drbd-user mailing list submissions to
	drbd-user at lists.linbit.com

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.linbit.com/mailman/listinfo/drbd-user
or, via email, send a message with subject or body 'help' to
	drbd-user-request at lists.linbit.com

You can reach the person managing the list at
	drbd-user-owner at lists.linbit.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of drbd-user digest..."


Today's Topics:

   1. Re: 8.3.5 Stalling on sync (Roof, Morey R.)
   2. Re: 8.3.5 Stalling on sync (Mike Lovell) (David.Livingstone at cn.ca)


----------------------------------------------------------------------

Message: 1
Date: Wed, 25 Nov 2009 14:03:14 -0700
From: "Roof, Morey R." <MRoof at admin.nmt.edu>
Subject: Re: [DRBD-user] 8.3.5 Stalling on sync
To: <drbd-user at lists.linbit.com>
Message-ID: <C99FEB4E3BA7A84A854CE66AE19CCDE8135E2FE4 at admin.NMTADM.AD>
Content-Type: text/plain; charset="iso-8859-1"

Actually I have those exact cards and I'm not seeing your problem but
getting those cards to work was a major pain in the rear end.  I much prefer
the Myricom cards but for this HP server pair I got stuck using the HP cards
due to a political issue.
 
Anyways, some of the things I found out about these cards might be of help
to you.  We use SuSE here but doing the same for RedHat shouldn't be much of
a problem.  The biggest issue is that these cards get very hot and can over
heat easily if they don't have a good amount of airflow.  Once they begin to
overheat packets disapper and things fall apart.  Since you are seeing
stalls after a bit of a run I would think that you might be having an
overheating issue.
 
Also, the driver that comes with Linux kernel doesn't work very well so you
need to get the HP driver and install it.  HOWEVER, you absolutely must use
the driver version that match the firmware version.  If they are different
things don't work and you can't even run the diagnostic tool.  Here I'm
running firmware 4.0.516 and driver 4.0.516.
 
When I was trying to get these working I would setup long runs of netperf
and iperf and see how hot I can get the cards and then run the diagnostic
tool as it will tell you the temperature of the card.  I have found they
start to freak out at about 85C.  After playing around with card position
they run under load at 66C and seem to work fine with 27C ambient air temp. 
 
All in all I'm not very impressed with these cards but I got stuck using
them in one place.
 
Hope the information helps a bit,
Morey


________________________________

From: drbd-user-bounces at lists.linbit.com
[mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Mike Lovell
Sent: Wednesday, November 25, 2009 11:45 AM
To: James Larcombe
Cc: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] 8.3.5 Stalling on sync


hrm. i thought i had heard of someone using drbd over 10 gig with netxen
cards. i went looking for a few minutes and didn't find anything though. my
recommendation would be try newer drivers either through compiling the
drivers for you existing kernel or using a newer kernel. i don't have
details on how to do that for your cards cause i have never used any 10 gig
from hp or netxen. other than that, my only recommendation is new nics.

good luck

mike

James Larcombe wrote: 

	Hi Mike,

	

	The cards I'm using are HP NC522SFP Dual Port 10GbE Server Adapters
with HP BLc 10Gb SR SFP+ Fiber Transceivers. I could try running these with
1GB Fiber cables instead of 10GB. 

	

	James

	

	From: Mike Lovell [mailto:mike at dev-zero.net] 
	Sent: 25 November 2009 16:01
	To: James Larcombe
	Cc: drbd-user at lists.linbit.com
	Subject: Re: [DRBD-user] 8.3.5 Stalling on sync

	

	nothing i tried tweaking in drbd.conf worked. the only thing that
did was changing the 10gig interfaces. what cards are you using? i was using
ones with an intel chip. the cards that i did get it to work with were from
chelsio. in my previous thread on the list, someone mentioned that they had
neterion cards working.
	
	mike
	
	James Larcombe wrote: 

	Hi Mike,

	

	Thanks for the quick response. Yes you are correct we are using
10gig fibre cards. I'm not sure we could change them though as the fibre
modules used in them cost over ?400 each.

	

	Is there anything I can tweak in the drbd.conf file to get these to
work.

	

	James

	

	From: Mike Lovell [mailto:mike at dev-zero.net] 
	Sent: 24 November 2009 17:49
	To: James Larcombe
	Cc: drbd-user at lists.linbit.com
	Subject: Re: [DRBD-user] 8.3.5 Stalling on sync

	

	James Larcombe wrote: 

	Hi List,

	

	Please help. I have installed drbd 8.3.5 on Open Suse 11.1 (Kernel
2.6.27.29-0.1). 

	

	I have run drbdadm create-md dbms-test on one node and create-md
dbms-test2 on the other node. I then ran drbdadm up all on both nodes. I
then ran drbdadm -- --overwrite-data-of-my-peer primary dbms-test on the
first node and the same with dbms-test2 on the other node. They then run for
a short while before stalling. I have tried older version without success
and turning the sync rate down does not make any difference. Downing the
resources and bringing back up starts the sync again but this then stalls
quickly.

	

	I have attached /proc/drbd, /etc/drbd.conf and a section from
/var/log/messages. Any pointers would be greatly appreciated.

	

	version: 8.3.5 (api:88/proto:86-91)

	GIT-hash: ded8cdf09b0efa1460e8ce7a72327c60ff2210fb build by
root at hp-tm-40, 2009-11-24 12:21:46

	 0: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C
r----

	    ns:160896 nr:0 dw:0 dr:160896 al:0 bm:9 lo:1 pe:0 ua:0 ap:0 ep:1
wo:b oos:926694296

	        [>.] sync'ed:  0.1% (905040/905132)M      4972

	        stalled

	 1: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C
r----

	    ns:0 nr:2173248 dw:2173248 dr:0 al:0 bm:132 lo:0 pe:29878 ua:0
ap:0 ep:1 wo:b oos:777971256

	        [>.] sync'ed:  0.3% (759736/761856)M

	        Stalled

	
	what kind of network are you using between the two servers? this is
almost the exact same behavior i had when i was trying to get drbd to work
over 10gig ethernet. turned out to be something in drbd didn't like
something about the 10gig cards i had. i eventually had to change my network
cards. what cards are you using? 1gig? 10gig? have you tried other cards?
that is where i would look.
	
	mike

	


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.linbit.com/pipermail/drbd-user/attachments/20091125/1da3abd7/a
ttachment-0001.htm>

------------------------------

Message: 2
Date: Wed, 25 Nov 2009 14:27:29 -0700
From: David.Livingstone at cn.ca
Subject: Re: [DRBD-user] 8.3.5 Stalling on sync (Mike Lovell)
To: drbd-user at lists.linbit.com
Message-ID:
	<OF45AFA1F2.C44E4250-ON87257679.007476DA-87257679.0075DEFE at cn.ca>
Content-Type: text/plain; charset="us-ascii"

I've been using HP NC510C PCIe 10 gigabit nic(netxen) since early 2009 in
a drbd setup between two DL380G5. We did experience hanging issues with 
the card but this was related to driver versions(HP psp support packs).
We ended up opening a case with HP and are currently running on an "older" 

version of the nx_nic driver. If you want I will send you the specifics 
offline.

BTW I just purchased some DL380G6 with NC522SFP(with BLc copper)and will 
be setting them
    up in the New Year.

> Message: 4
> Date: Wed, 25 Nov 2009 11:45:23 -0700
> From: Mike Lovell <mike at dev-zero.net>
> Subject: Re: [DRBD-user] 8.3.5 Stalling on sync
> To: James Larcombe <jim at roadtech.co.uk>
> Cc: drbd-user at lists.linbit.com
> Message-ID: <4B0D7B43.3060302 at dev-zero.net>
> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

> hrm. i thought i had heard of someone using drbd over 10 gig with netxen
> cards. i went looking for a few minutes and didn't find anything though.
> my recommendation would be try newer drivers either through compiling
> the drivers for you existing kernel or using a newer kernel. i don't
> have details on how to do that for your cards cause i have never used
> any 10 gig from hp or netxen. other than that, my only recommendation is
> new nics.

> good luck

> mike

> James Larcombe wrote:
> >
> > Hi Mike,
> >
> >
> >
> > The cards I'm using are HP NC522SFP Dual Port 10GbE Server Adapters
> > with HP BLc 10Gb SR SFP+ Fiber Transceivers. I could try running these
> > with 1GB Fiber cables instead of 10GB.
> >
> >
> >
> > James
> >
> >
> >
> > *From:* Mike Lovell [mailto:mike at dev-zero.net]
> > *Sent:* 25 November 2009 16:01
> > *To:* James Larcombe
> > *Cc:* drbd-user at lists.linbit.com
> > *Subject:* Re: [DRBD-user] 8.3.5 Stalling on sync
> >
> >
> >
> > nothing i tried tweaking in drbd.conf worked. the only thing that did
> > was changing the 10gig interfaces. what cards are you using? i was
> > using ones with an intel chip. the cards that i did get it to work
> > with were from chelsio. in my previous thread on the list, someone
> > mentioned that they had neterion cards working.
> >
> > mike
> >
> > James Larcombe wrote:
> >
> > Hi Mike,
> >
> >
> >
> > Thanks for the quick response. Yes you are correct we are using 10gig
> > fibre cards. I'm not sure we could change them though as the fibre
> > modules used in them cost over ?400 each.
> >
> >
> >
> > Is there anything I can tweak in the drbd.conf file to get these to 
work.
> >
> >
> >
> > James
> >
> >
> >
> > *From:* Mike Lovell [mailto:mike at dev-zero.net]
> > *Sent:* 24 November 2009 17:49
> > *To:* James Larcombe
> > *Cc:* drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
> > *Subject:* Re: [DRBD-user] 8.3.5 Stalling on sync
> >
> >
> >
> > James Larcombe wrote:
> >
> > Hi List,
> >
> >
> >
> > Please help. I have installed drbd 8.3.5 on Open Suse 11.1 (Kernel
> > 2.6.27.29-0.1).
> >
> >
> >
> > I have run drbdadm create-md dbms-test on one node and create-md
> > dbms-test2 on the other node. I then ran drbdadm up all on both nodes.
> > I then ran drbdadm -- --overwrite-data-of-my-peer primary dbms-test on
> > the first node and the same with dbms-test2 on the other node. They
> > then run for a short while before stalling. I have tried older version
> > without success and turning the sync rate down does not make any
> > difference. Downing the resources and bringing back up starts the sync
> > again but this then stalls quickly.
> >
> >
> >
> > I have attached /proc/drbd, /etc/drbd.conf and a section from
> > /var/log/messages. Any pointers would be greatly appreciated.
> >
> >
> >
> > version: 8.3.5 (api:88/proto:86-91)
> >
> > GIT-hash: ded8cdf09b0efa1460e8ce7a72327c60ff2210fb build by
> > root at hp-tm-40, 2009-11-24 12:21:46
> >
> >  0: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C 
r----
> >
> >     ns:160896 nr:0 dw:0 dr:160896 al:0 bm:9 lo:1 pe:0 ua:0 ap:0 ep:1
> > wo:b oos:926694296
> >
> >         [>.] sync'ed:  0.1% (905040/905132)M      4972
> >
> >         stalled
> >
> >  1: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C 
r----
> >
> >     ns:0 nr:2173248 dw:2173248 dr:0 al:0 bm:132 lo:0 pe:29878 ua:0
> > ap:0 ep:1 wo:b oos:777971256
> >
> >         [>.] sync'ed:  0.3% (759736/761856)M
> >
> >         Stalled
> >
> >
> > what kind of network are you using between the two servers? this is
> > almost the exact same behavior i had when i was trying to get drbd to
> > work over 10gig ethernet. turned out to be something in drbd didn't
> > like something about the 10gig cards i had. i eventually had to change
> > my network cards. what cards are you using? 1gig? 10gig? have you
> > tried other cards? that is where i would look.
> >
> > mike
> >
> >
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.linbit.com/pipermail/drbd-user/attachments/20091125/c4b98ef7/a
ttachment.htm>

------------------------------

_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


End of drbd-user Digest, Vol 64, Issue 42
*****************************************

*RT IMSS Scanned*


*************************************************************************
This e-mail is confidential and may be legally privileged. It is intended
solely for the use of the individual(s) to whom it is addressed. Any
content in this message is not necessarily a view or statement from Road
Tech Computer Systems Limited but is that of the individual sender. If
you are not the intended recipient, be advised that you have received
this e-mail in error and that any use, dissemination, forwarding,
printing, or copying of this e-mail is strictly prohibited. We use
reasonable endeavours to virus scan all e-mails leaving the company but
no warranty is given that this e-mail and any attachments are virus free.
You should undertake your own virus checking. The right to monitor e-mail
communications through our networks is reserved by us

  Road Tech Computer Systems Ltd. Shenley Hall, Rectory Lane, Shenley,
  Radlett, Hertfordshire, WD7 9AN. - VAT Registration No GB 449 3582 17
  Registered in England No: 02017435, Registered Address: Charter Court, 
  Midland Road, Hemel Hempstead,  Hertfordshire, HP2 5GE. 
*************************************************************************


More information about the drbd-user mailing list