[DRBD-user] DRBD 0.7 over LVM crash debian etch server

Mon Nov 5 17:09:49 CET 2007

Adrien,

On Thursday 01 November 2007 10:39:02 Graham Wood wrote:
> >   on prod1 {
> >     device     /dev/drbd0;
> >     disk       /dev/sda3;
> >     address    10.0.0.229:7788;
> >     meta-disk  internal;
> >   }
> >
> >   on data {
> >     device    /dev/drbd0;
> >     disk      /dev/vg0/part1;
> >     address   10.0.0.242:7788;
> >     meta-disk internal;
> >   }
> > }
>
> This is (hopefully) a really silly question, but you are using a
> different local port on the backup server for each device, aren't you?
>
> The other issue you will have is that with 15 production servers
> (guessing from the 500Go/30Go) hitting a single spindle to do the
> initial sync, your backup server is going to be hammered.

I agree with what Graham is saying. However the performance issue you 
are facing is not only limited to the initial sync (and every resync), but to 
replication during normal operation as well. And that, since you are using 
synchronous replication (protocol C), will very adversely affect performance 
on your so-called "production" servers as well. 

Your design, unfortunately, is fundamentally flawed in that sense. I have 
previously blogged about the misconceptions that underlie your design; see 
http://blogs.linbit.com/florian/2007/06/22/performance-tuning-drbd-setups/ 
(particularly the paragraph starting with "don't buy a crappy box for use as 
the Secondary"; what is mentioned there applies even more to your many-to-one 
approach). In essence, you are dividing your available I/O bandwidth for 
_every_ "production" server by n, where n is the number of "production" 
servers. And if just one of your servers manages to saturate the available 
I/O bandwidth on the "backup" server, writes on _all_ servers are likely to 
almost grind to a halt. Network bandwidth may also become a bottleneck, 
assuming your "backup" server doesn't have a much fatter pipe than 
your "production" boxes. 

I'm sorry to say, you have designed a performance killer. And, since 
you are using that Secondary for backup purposes, you can't even switch to 
protocols A or B (which would somewhat alleviate your problem), since that 
would defy the very purpose of a backup. Your "backup" server's I/O stack 
would have offer several times better write performance than each of 
your "production" servers -- if you are dealing with any non-negligeable 
write load on the latter -- for your design to stand a chance. Not to mention 
that you had better put at least every one of your backup volumes on a 
spindle of its own.

You really need to fix that design. IMHO it doesn't make sense to even 
continue testing in your current setup. Sorry if perhaps I'm sounding harsh; 
I am just trying to save you effort that looks quite useless to me. Unless, 
perhaps, your write load is very very low. Even in that case, you'd have to 
fix that resync rate in your config -- or force non-simultaneous resync. But 
if your write load were so low, you never would have turned to DRBD in the 
first place, I assume.

Cheers,
Florian

-- 
: Florian G. Haas
: LINBIT Information Technologies GmbH
: Vivenotgasse 48, A-1120 Vienna, Austria