[DRBD-user] DRBD serious locking due to TOE - UPDATE

Ben Clewett ben at roadrunner.uk.com
Tue Jan 8 16:02:53 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.



(Second send, used wrong email address :)

Hi Lars,

Lars Ellenberg wrote:
> grrr. stripp of one of the nots, please,
> either "not likely" or "likely not".
> anyways, it is NOT a problem in drbd.
> but you knew what I meant, right?

Many thanks, lots of good information to work with.

I knew what you meant.  I don't believe it is a problem with DRBD.  But
it manifests through DRBD.

--------

There is a know problem with my NIC, even at
the latest driver:

http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1199792167647+28353475&threadId=1186722

However the suggested fix here doesn't make a difference:

# ethtool -K eth2 tso off

I have a known problem which does not respond to the known fix!

But I still suspect the NIC's...  Tonight I am going to move them all
around to see if the problem follows the card.

--------

I also know that if I reboot the servers, I'll get a few days of grace
before the locking returns.  I don't think that is DRBD related?

I also note that slowing data input by introducing small 1
second gaps every half minute or so, actually increases the rate at
which data is accepted.  What could cause that affect?

--------

The analysis of /proc/drbd shows data to be moving.  But I can see a
pattern:

With my symmetrical system, on both servers:
# watch -n 0.1 -d cat /proc/drbd

The DRBD resource which is not 'locked' (drbd0) is moving data 
continuously.
lo:1 is the normal condition, ns and dw are increasing linearly.

However the DRBD disk which is locked (drbd1) is only sporadically moved 
data.
It can spend up to ~2 seconds stuck on:
	Primary:	bm:0 lo:n pe:0 ua:0 ap:n
	Secondary:	bm:0 lo:0 pe:0 ua:0 ap:0
Where n is some number between 1 and ~ 50.

It looks like the traffic flow from one server is blocking the traffic
flow from the another, like it has a higher priority?

This might also explain by introducing 1 second gaps helps get data 
moving, it stops one server letting the other work.

But it might also be coincidence...

I have checked my TCP shaping to ensure there is none.  Also set the 
fifo buffer in TCP 'bfifo' qdisc from 10K to 10M.  Makes no difference.

Besides which, I am only uses about 5% of bandwidth, so traffic shaping 
is irrelevant.

----

You also asked about mounting:

# cat /proc/mounts | grep drbd

/dev/drbd1 /dbms-07-02 reiserfs rw 0 0

Do you know of any options which might help?

Thanks again for your help,

Ben



















Lars Ellenberg wrote:
> On Tue, Jan 08, 2008 at 12:21:21PM +0100, Lars Ellenberg wrote:
>> On Tue, Jan 08, 2008 at 09:06:44AM +0000, Ben Clewett wrote:
>>>
>>> Dear Lars,
>>>
>>> I found a server at lock when I got to my desk this morning.  Not wanting to
>>> waist any time, these are the numbers you ask for.
>>>
>>> Lock on 'hp-tm-02', twin with 'hp-tm-04' which is partially locked.
>>>
>>> I use the term 'lock' to explain a server with high load and very much
>>> reduced throughput.
>> as long as there is still throughput,
>> it is not likely not a problem in drbd.
> 
> grrr. stripp of one of the nots, please,
> either "not likely" or "likely not".
> anyways, it is NOT a problem in drbd.
> but you knew what I meant, right?
> 



*************************************************************************
This e-mail is confidential and may be legally privileged. It is intended
solely for the use of the individual(s) to whom it is addressed. Any
content in this message is not necessarily a view or statement from Road
Tech Computer Systems Limited but is that of the individual sender. If
you are not the intended recipient, be advised that you have received
this e-mail in error and that any use, dissemination, forwarding,
printing, or copying of this e-mail is strictly prohibited. We use
reasonable endeavours to virus scan all e-mails leaving the company but
no warranty is given that this e-mail and any attachments are virus free.
You should undertake your own virus checking. The right to monitor e-mail
communications through our networks is reserved by us

  Road Tech Computer Systems Ltd. Shenley Hall, Rectory Lane, Shenley,
  Radlett, Hertfordshire, WD7 9AN. - VAT Registration No GB 449 3582 17
  Registered in England No: 02017435, Registered Address: Charter Court, 
  Midland Road, Hemel Hempstead,  Hertfordshire, HP2 5GE. 
*************************************************************************



More information about the drbd-user mailing list