[DRBD-user] 8.3.5 Stalling on sync

Roof, Morey R. MRoof at admin.nmt.edu
Wed Nov 25 22:03:14 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Actually I have those exact cards and I'm not seeing your problem but getting those cards to work was a major pain in the rear end.  I much prefer the Myricom cards but for this HP server pair I got stuck using the HP cards due to a political issue.
 
Anyways, some of the things I found out about these cards might be of help to you.  We use SuSE here but doing the same for RedHat shouldn't be much of a problem.  The biggest issue is that these cards get very hot and can over heat easily if they don't have a good amount of airflow.  Once they begin to overheat packets disapper and things fall apart.  Since you are seeing stalls after a bit of a run I would think that you might be having an overheating issue.
 
Also, the driver that comes with Linux kernel doesn't work very well so you need to get the HP driver and install it.  HOWEVER, you absolutely must use the driver version that match the firmware version.  If they are different things don't work and you can't even run the diagnostic tool.  Here I'm running firmware 4.0.516 and driver 4.0.516.
 
When I was trying to get these working I would setup long runs of netperf and iperf and see how hot I can get the cards and then run the diagnostic tool as it will tell you the temperature of the card.  I have found they start to freak out at about 85C.  After playing around with card position they run under load at 66C and seem to work fine with 27C ambient air temp. 
 
All in all I'm not very impressed with these cards but I got stuck using them in one place.
 
Hope the information helps a bit,
Morey


________________________________

From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Mike Lovell
Sent: Wednesday, November 25, 2009 11:45 AM
To: James Larcombe
Cc: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] 8.3.5 Stalling on sync


hrm. i thought i had heard of someone using drbd over 10 gig with netxen cards. i went looking for a few minutes and didn't find anything though. my recommendation would be try newer drivers either through compiling the drivers for you existing kernel or using a newer kernel. i don't have details on how to do that for your cards cause i have never used any 10 gig from hp or netxen. other than that, my only recommendation is new nics.

good luck

mike

James Larcombe wrote: 

	Hi Mike,

	

	The cards I'm using are HP NC522SFP Dual Port 10GbE Server Adapters with HP BLc 10Gb SR SFP+ Fiber Transceivers. I could try running these with 1GB Fiber cables instead of 10GB. 

	

	James

	

	From: Mike Lovell [mailto:mike at dev-zero.net] 
	Sent: 25 November 2009 16:01
	To: James Larcombe
	Cc: drbd-user at lists.linbit.com
	Subject: Re: [DRBD-user] 8.3.5 Stalling on sync

	

	nothing i tried tweaking in drbd.conf worked. the only thing that did was changing the 10gig interfaces. what cards are you using? i was using ones with an intel chip. the cards that i did get it to work with were from chelsio. in my previous thread on the list, someone mentioned that they had neterion cards working.
	
	mike
	
	James Larcombe wrote: 

	Hi Mike,

	

	Thanks for the quick response. Yes you are correct we are using 10gig fibre cards. I'm not sure we could change them though as the fibre modules used in them cost over £400 each.

	

	Is there anything I can tweak in the drbd.conf file to get these to work.

	

	James

	

	From: Mike Lovell [mailto:mike at dev-zero.net] 
	Sent: 24 November 2009 17:49
	To: James Larcombe
	Cc: drbd-user at lists.linbit.com
	Subject: Re: [DRBD-user] 8.3.5 Stalling on sync

	

	James Larcombe wrote: 

	Hi List,

	

	Please help. I have installed drbd 8.3.5 on Open Suse 11.1 (Kernel 2.6.27.29-0.1). 

	

	I have run drbdadm create-md dbms-test on one node and create-md dbms-test2 on the other node. I then ran drbdadm up all on both nodes. I then ran drbdadm -- --overwrite-data-of-my-peer primary dbms-test on the first node and the same with dbms-test2 on the other node. They then run for a short while before stalling. I have tried older version without success and turning the sync rate down does not make any difference. Downing the resources and bringing back up starts the sync again but this then stalls quickly.

	

	I have attached /proc/drbd, /etc/drbd.conf and a section from /var/log/messages. Any pointers would be greatly appreciated.

	

	version: 8.3.5 (api:88/proto:86-91)

	GIT-hash: ded8cdf09b0efa1460e8ce7a72327c60ff2210fb build by root at hp-tm-40, 2009-11-24 12:21:46

	 0: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r----

	    ns:160896 nr:0 dw:0 dr:160896 al:0 bm:9 lo:1 pe:0 ua:0 ap:0 ep:1 wo:b oos:926694296

	        [>.] sync'ed:  0.1% (905040/905132)M      4972

	        stalled

	 1: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r----

	    ns:0 nr:2173248 dw:2173248 dr:0 al:0 bm:132 lo:0 pe:29878 ua:0 ap:0 ep:1 wo:b oos:777971256

	        [>.] sync'ed:  0.3% (759736/761856)M

	        Stalled

	
	what kind of network are you using between the two servers? this is almost the exact same behavior i had when i was trying to get drbd to work over 10gig ethernet. turned out to be something in drbd didn't like something about the 10gig cards i had. i eventually had to change my network cards. what cards are you using? 1gig? 10gig? have you tried other cards? that is where i would look.
	
	mike

	


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20091125/1da3abd7/attachment.htm>


More information about the drbd-user mailing list