[DRBD-user] RE: drbd 0.6.12 + heartbeat + synchronization + machine load

Thu May 20 13:14:56 CEST 2004

/ 2004-05-20 12:34:49 +0200
\ Zanen, Sietse van:

Sir, please get you a mail program /interface, that is able to get the
quoting right. Please.

> > > The SW RAID seems to outperform the HW RAID by 100%
>
> > yes. :) 

> Now, in the past few days it seemed more an more like this was
> my problem. I did some speed testing on the HW and SW raid. I
> discovered, that write rates on the HW RAID just didn't get any better
> than 5=6MB/s. As the sync rate was higher than that it overloaded my
> system. I tried fiddling with syncmax, but the only thing I seemd to
> achieve is to delay the locking up of both systems (ie even with
> sync-max to 1MB/s, it would lock up after about 30mins).
> 	Luckily the DELL 2400 comes with BIOS option to switch from HW
> raid to SCSI. I did and now both systems are SW RAID, no more lock ups,
> but sync speed now seems to drop to around 700KB/s after 15-20 minutes
> 	Leaves the question though why under / over performance of one
> side of the link causes both machines to lock up. This shouldn't happen.
> Never.

hm. as long as they get data through, they only get *really* slow.
and since in the 2.4 kernel *one* dead-slow block device will
slow down *all* block devices, the boxes seem hung.

if the really no longer get data through,
have a look at the "ko-count".

> > > On rare occasions I saw lock-ups of fsck or mount during heartbeat
> > > start-up. One time even causing entire system to hang during reboot
> > > (killall was not able to kill a hanging mount process.)
> > > Maybe also important info: Some md devices were syncing at the same time
> > > drbd devices were syncing. This too was not acheiving high speeds. You
> > > would expect this, when drbd sync uses 5MB, but not when that
> > > drops. You
> > > then would expect md sync to go faster, but it didn't, it
> > > would stay at
> > > 100-300KB/s.
> > 	 
> > so don't expect performance, when you thrash your harddisks on
> > an SMP box with several kernel threads *plus* applications.
> > no wonder your box goes south.
> >  
> > on a very different scale of course, but what your are doing is
> > similar to running a high performance database with its backing
> > storage being a floppy. if your hardware cannot cope with what
> > your software demands, than thats what you get: unresponsive
> > systems at best. 

> 	I disagree, syncing should NEVER trash harddisks. MD sync
> doesn't.

DRBD does not either.

well, thats the problem. you now have *several independend threads*
that try to sync. to MD, the DRBD syncer is just an other user.

they both do sequential access, but on different locations, both try to
read in / write out the full device. but they don't coordinate, so you
get seeking back and forth whenever the other thread takes control.
and as the gap between the locations they access gets larger,
seektime increases, since it needs to seek over more and more cylinders.

> Sync prio just lowers as IO increases. Also the apps I were
> running, weren't even generating IO at the time. Also, as the config
> might suggest, I'm setting up a syslog collector cluster, which needs to
> process around 128Kbits/s of log data. I would actually expect to be
> able to write that to a floppy, wouldn't I? So if, MD sync is doing
> 100KB/s drbd sync should be able to do 10MB/s easily and vice versa.

you forget about the seektime.

if you have sustained throughput for single threadded sequential access
of, say, 15MB, then if you add 40 deterministic seeks over the half of
the device per second with say, 20ms each, you are at about 3MB. add in
some sporadic random seeks, plus network latency, plus seektime on the
peer...
figures just an example.

it is latency that kills your performance.

	Lars Ellenberg