[DRBD-user] Drbd make user timed out when secondary node up

Lars Ellenberg lars.ellenberg at linbit.com
Sat Jun 10 00:42:41 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2006-06-09 13:01:23 -0400
\ Claude Pelletier:
> 
> M. Ellenberg
> 
> I have a problem with DRBD

hope you don't mind me CC-ing the drbd-user lists,
where I think this mail should have benn posted in the first place,
anyways.

> This is the scenario.
> We are using samba on the linux box getting replicate thru drbd
> 
> They are using a 10MB line betwwen the 2 machines they are in different town.

I take it this is a 10 Mega Bit line, right?
so you have a throughput of about 1 Mega Byte (Octet) per second.

> When the user copy a 500mb file on the PRIMARY linux server and drbd
> is up on both node

well, that will take at least 500 seconds, then.
that is more than 8 minutes.

and if that line does not behave, i.e. the ack packets have difficulties
to get through when congested, this will feel very sluggish.

> il look like the copy get timed out and lost contact with the PRIMARY
> linux box.

well, then the parameters of your line when congested (e.g. ping rtt)
require higher timeout settings.

> The change I did to try to fix this was
> first changing the protocole from C to A
> and change the sndbuf-size from the default to 512K

this will only affect the initial latency.
it cannot possibly convert a 10 MBit line into a GigE connection.

> Look like those changes didnt make a diference The only way I know
> right now to fix the problem is to get the secondary node down
> 
> When the primary is in the state Primary/Unknown the copy of files work great
> but I dont have replication anymore.

well, of course. you then get the local io-bandwidth.

> drbdsetup /dev/drbd0 show
> 
> 
> Lower device: 253:02   (dm-2)
> Meta device: 08:113   (sdh1)
> Meta index: 0
> Disk options:
> Local address: 10.10.1.3:7789
> Remote address: 10.10.2.66:7789
> Wire protocol: A
> Net options:
>  timeout = 6.0 sec (default)

if you hit this too often, increase it.
or improve the parameters of your link.

>  connect-int = 10 sec (default)
>  ping-int = 10 sec (default)

these should then probably be increased, too.

>  max-epoch-size = 2048  (default)
>  max-buffers = 2048  (default)
>  sndbuf-size = 524288
>  ko-count = 0  (default)
> Syncer options:
>  rate = 71680 KB/sec
>  group = 0  (default)
>  al-extents = 127  (default)
> 
> I'm almost sure I need to set others paramater to improve the performance of
> drbd on the primary node or maybe tune the one I got.

if your replication link has a bandwith of
 1 Mega Byte per second, drbd cannot write faster.
that is just common sense. no drbd config option
can make that link go faster.

since you probably have an encrypting VPN between the sites
(you should have, anyways) you can try and figure out wether you can use
some compression settings on that vpn (if it burns cpu cycles to
encrypt, a few more to at least try to compress won't hurt).
could for "normal" files probably double or tripple the speed.
still not very fast with a base rate of 10 MBit,
probably feels like burning an old quad-cdrom drive...

note, however, that _read_ requests are carried out locally,
and thus should still be as fast as your local io bandwith can provide.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :



More information about the drbd-user mailing list