[DRBD-user] which is the DRBD write block size (if any)?

Thu Aug 5 10:44:23 CEST 2010

Still about raw disks,

is DRBD writing with some adjustable block size?
Or is this set at linux system level?

Thank you for any clarification and tip.

R.

Le mail ti raggiungono ovunque con BlackBerry® from Vodafone!

-----Original Message-----
From: drbd-user-request at lists.linbit.com
Sender: drbd-user-bounces at lists.linbit.com
Date: Thu, 05 Aug 2010 09:49:42 
To: <drbd-user at lists.linbit.com>
Reply-To: drbd-user at lists.linbit.com
Subject: drbd-user Digest, Vol 73, Issue 4

Send drbd-user mailing list submissions to
	drbd-user at lists.linbit.com

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.linbit.com/mailman/listinfo/drbd-user
or, via email, send a message with subject or body 'help' to
	drbd-user-request at lists.linbit.com

You can reach the person managing the list at
	drbd-user-owner at lists.linbit.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of drbd-user digest..."

Today's Topics:

   1. Two newbie questions (Steve Thompson)
   2. Re: Two newbie questions (Alex Dean)
   3. Re: DRBD9 (Lars Ellenberg)
   4. barrier mode on LVM containers (Sebastian Hetze)
   5. Re: barrier mode on LVM containers (Lars Ellenberg)
   6. Re: Two newbie questions (Bart Coninckx)
   7. Re: Two newbie questions (Ben Beuchler)
   8. Re: Two newbie questions (Bart Coninckx)
   9. Re: existing volume (Rohit Upadhyay)

----------------------------------------------------------------------

Message: 1
Date: Wed, 4 Aug 2010 10:12:20 -0400 (EDT)
From: Steve Thompson <smt at vgersoft.com>
Subject: [DRBD-user] Two newbie questions
To: drbd-user at lists.linbit.com
Message-ID:
	<alpine.LRH.0.9999.1008041002530.22003 at firefly.vgersoft.com>
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII

New to DRBD; using it with two Dell PE2900 systems, CentOS 5.5 64-bit. Two 
questions:

(1) I am using dual back-to-back gigabits with mtu=9000 for replication 
with bonding in balance-rr mode with no issues. The documentation suggests 
that active-backup mode be used. I'm getting an iperf performance across 
this link of 1.97 Gbits/sec, and a drbd sync of a single volume gets about 
133 MB/sec, which is just a hair quicker than bonnie++ tells me that this 
file system can block-write. I'm interested in why the active-backup 
recommendation, and other folks' experiences with bonding.

(2) System A (primary) and system B (secondary). If B is shut down, A 
maintains a list of out-of-sync blocks for B. Where is this kept? If in 
the metadata (internal), how often is it updated on disk? If A is shut 
down and rebooted before B comes alive, any chance of losing any updates?

Steve
----------------------------------------------------------------------------
Steve Thompson                 E-mail:      smt AT vgersoft DOT com
Voyager Software LLC           Web:         http://www DOT vgersoft DOT com
39 Smugglers Path              VSW Support: support AT vgersoft DOT com
Ithaca, NY 14850
   "186,300 miles per second: it's not just a good idea, it's the law"
----------------------------------------------------------------------------

------------------------------

Message: 2
Date: Wed, 4 Aug 2010 11:30:15 -0500
From: Alex Dean <alex at crackpot.org>
Subject: Re: [DRBD-user] Two newbie questions
To: drbd-user at lists.linbit.com
Message-ID: <5B5053AF-5720-4A73-8E14-F1AF85506691 at crackpot.org>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

On Aug 4, 2010, at 9:12 AM, Steve Thompson wrote:

> New to DRBD; using it with two Dell PE2900 systems, CentOS 5.5 64- 
> bit. Two questions:
>
> (1) I am using dual back-to-back gigabits with mtu=9000 for  
> replication with bonding in balance-rr mode with no issues. The  
> documentation suggests that active-backup mode be used. I'm getting  
> an iperf performance across this link of 1.97 Gbits/sec, and a drbd  
> sync of a single volume gets about 133 MB/sec, which is just a hair  
> quicker than bonnie++ tells me that this file system can block- 
> write. I'm interested in why the active-backup recommendation, and  
> other folks' experiences with bonding.

I recall seeing a lot of dropped packets when doing drbd replication  
over bonded crossover cables in balance-rr mode.  Things were fine  
under normal circumstances, and drops showed up when I really loaded  
the primary.  active-backup gave me enough throughput, so I switched.   
That was on HP DL380s with Centos 5.5.  I don't recall which ethernet  
driver was in use.

alex

------------------------------

Message: 3
Date: Wed, 4 Aug 2010 19:37:18 +0200
From: Lars Ellenberg <lars.ellenberg at linbit.com>
Subject: Re: [DRBD-user] DRBD9
To: drbd-user at lists.linbit.com
Message-ID: <20100804173718.GA24256 at barkeeper1-xen.linbit>
Content-Type: text/plain; charset=iso-8859-1

On Tue, Aug 03, 2010 at 02:47:01PM +0200, Piotr Kandziora wrote:
> Hi all,
> 
> 
> My question is rather directed to LINBIT's guys. I am writing here,
> because I think that answers for my questions (that you can find
> below) will be useful for DRBD community.
> 
> I have noticed that on LINBIT's website there are some details
> concerning DRBD version 9.
> 
> What is current state of work if it comes about this project?
> When are you going to release initial version?
> On your website we can read about main features. Is this features
> list actual (maybe you can say what more is planned)?

Well, our development resources are limited, and sponsored features,
consulting, support, integration work,... has priority,
most of the time.

So it all happens much slower than some people would like
and there is no release date set for the initial version.

incomplete planned feature list in no particular order:

 * multiple Secondaries without stacking
 * support for daisy chaining of replication "hops"
 * "arbitrarily" large devices
 * full data log to support point-in-time recovery of arbitrary block data
 * support for time-shift replication (have one of the secondaries
   consistently lag behind by $configurable)
 * more than two Primaries
 * all sorts of performance improvements

Any particular reason you ask, a favorite feature you'd like to sponsor?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed

------------------------------

Message: 4
Date: Wed, 4 Aug 2010 19:30:42 +0200
From: Sebastian Hetze <s.hetze at linux-ag.com>
Subject: [DRBD-user] barrier mode on LVM containers
To: drbd-user at lists.linbit.com
Message-ID: <20100804173043.004FE303001C at mail.linux-ag.de>
Content-Type: text/plain; charset=us-ascii

Hi *,

although the manual page for drbd.conf says that "DRBD will use the
first method that is supported by the backing storage" and
"Unfortunately device mapper (LVM) does not support barriers." we find
that barriers is the default setting for DRBD on top of LVM containers
with version 8.3.7 (api:88/proto:86-92) srcversion:
582E47DEE6FD9EC45926ECF from linux 2.6.34.1 (and probably other versions
as well)

With protocol B this can lead to an situation where the secondary node
becomes completely unusable. It looks like the secondary sends all IO
requests to the LVM layer and LVM can not manage the queue after a
certain point.

I would expect DRBD to use the flush method on LVM containers as
default. At least if protocol B is used.

To demonstrate this behaviour, I suggest to set up a system with 10 or
more DRBD resources (using protocol B) on LVM containers and configure
syslog.conf such that it writes local messages into each of these
resources (with sync). Given that the DRBD resources are mounted on
/srv/drbd01, /srv/drbd02, ...  the syslog.conf would read: 

...
local1.notice		/srv/drbd01/notice
local2.info		/srv/drbd01/info
local1.notice		/srv/drbd02/notice
local2.info		/srv/drbd02/info
and so on...

Now use logger to write to all resources simultaniously:

time {
for loop in 1 2; do
for i in `seq -w 100`; do
        logger -p local1.notice -t logger "notice number $loop $i"
        logger -p local2.info -t logger "info number $loop $i"
        echo -n .
done
echo $loop
done
}

These are only 400 small messages for each DRBD resource. On the local
file system the whole thing finishes in less than 5 seconds.

In my test setup with 10 DRBD resources the logger loop takes arround
50 seconds to finish on the primary. While the primary is working with
load below 1, the secondary load raises up to 10 and stays there for a
couple of minutes. With only 10 resources the secondary recovers after
a while.
If you try the same simple test with 30 or more DRBD resources the
secondary will get a load of 40 and wont recover, at least not within
an hour.

With flush or protocol C it takes a couple of minutes to finish syncing
these 400 messages per resource and the secondary remains usable.
Why this must take sooo long is an other question...

Best regards,

  Sebastian

------------------------------

Message: 5
Date: Wed, 4 Aug 2010 20:00:01 +0200
From: Lars Ellenberg <lars.ellenberg at linbit.com>
Subject: Re: [DRBD-user] barrier mode on LVM containers
To: drbd-user at lists.linbit.com
Message-ID: <20100804180001.GC24256 at barkeeper1-xen.linbit>
Content-Type: text/plain; charset=iso-8859-1

On Wed, Aug 04, 2010 at 07:30:42PM +0200, Sebastian Hetze wrote:
> Hi *,

BTW, you need to subscribe (or use your subscribed address) to post here.

> although the manual page for drbd.conf says that "DRBD will use the
> first method that is supported by the backing storage" and
> "Unfortunately device mapper (LVM) does not support barriers."

That now reads "might not support barriers".

device mapper linear targets supported barriers for some time now,
if they had exactly one table entry, so extending thus likely
fragmenting the mapping, or adding snapshots, would break it support.

device mapper targets in recent kernels do support barriers to a much
higher degree. In general, linux mainline aims to support barriers
throughout the stack.

> we find that barriers is the default setting for DRBD on top of LVM
> containers with version 8.3.7 (api:88/proto:86-92) srcversion:
> 582E47DEE6FD9EC45926ECF from linux 2.6.34.1 (and probably other
> versions as well)

This has not much to do with the drbd version.  It just depends on
whether or not the lower level device supports barriers,
and how costly barriers or flushes are on your IO stack.

> With protocol B this can lead to an situation where the secondary node
> becomes completely unusable. It looks like the secondary sends all IO
> requests to the LVM layer and LVM can not manage the queue after a
> certain point.

Too bad.

> I would expect DRBD to use the flush method on LVM containers as
> default. At least if protocol B is used.

With kernels >= 2.6.24, a "flush" is implemented as "empty barrier",
so if there is no barrier support, there will be no flush support
either (except for maybe very few special cases).

> To demonstrate this behaviour, I suggest to set up a system with 10 or
> more DRBD resources (using protocol B) on LVM containers and configure
> syslog.conf such that it writes local messages into each of these
> resources (with sync). Given that the DRBD resources are mounted on
> /srv/drbd01, /srv/drbd02, ...  the syslog.conf would read: 
> 
> ...
> local1.notice		/srv/drbd01/notice
> local2.info		/srv/drbd01/info
> local1.notice		/srv/drbd02/notice
> local2.info		/srv/drbd02/info
> and so on...
> 
> Now use logger to write to all resources simultaniously:
> 
> time {
> for loop in 1 2; do
> for i in `seq -w 100`; do
>         logger -p local1.notice -t logger "notice number $loop $i"
>         logger -p local2.info -t logger "info number $loop $i"
>         echo -n .
> done
> echo $loop
> done
> }
> 
> These are only 400 small messages for each DRBD resource. On the local
> file system the whole thing finishes in less than 5 seconds.

Because it is not using barriers.

> In my test setup with 10 DRBD resources the logger loop takes arround
> 50 seconds to finish on the primary. While the primary is working with
> load below 1, the secondary load raises up to 10 and stays there for a
> couple of minutes. With only 10 resources the secondary recovers after
> a while.
> If you try the same simple test with 30 or more DRBD resources the
> secondary will get a load of 40 and wont recover, at least not within
> an hour.

 ;-)

If they are hurting you, disable barriers, then.

> With flush or protocol C it takes a couple of minutes to finish syncing
> these 400 messages per resource and the secondary remains usable.
> Why this must take sooo long is an other question...

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed

------------------------------

Message: 6
Date: Wed, 4 Aug 2010 20:32:33 +0200
From: Bart Coninckx <bart.coninckx at telenet.be>
Subject: Re: [DRBD-user] Two newbie questions
To: drbd-user at lists.linbit.com
Message-ID: <201008042032.33342.bart.coninckx at telenet.be>
Content-Type: Text/Plain;  charset="iso-8859-1"

On Wednesday 04 August 2010 18:30:15 Alex Dean wrote:
> On Aug 4, 2010, at 9:12 AM, Steve Thompson wrote:
> > New to DRBD; using it with two Dell PE2900 systems, CentOS 5.5 64-
> > bit. Two questions:
> >
> > (1) I am using dual back-to-back gigabits with mtu=9000 for
> > replication with bonding in balance-rr mode with no issues. The
> > documentation suggests that active-backup mode be used. I'm getting
> > an iperf performance across this link of 1.97 Gbits/sec, and a drbd
> > sync of a single volume gets about 133 MB/sec, which is just a hair
> > quicker than bonnie++ tells me that this file system can block-
> > write. I'm interested in why the active-backup recommendation, and
> > other folks' experiences with bonding.
> 
> I recall seeing a lot of dropped packets when doing drbd replication
> over bonded crossover cables in balance-rr mode.  Things were fine
> under normal circumstances, and drops showed up when I really loaded
> the primary.  active-backup gave me enough throughput, so I switched.
> That was on HP DL380s with Centos 5.5.  I don't recall which ethernet
> driver was in use.
> 
> alex
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 

So you were basically running with one gigabat NIC at the time ...
I use balance-rr and performance was better during test, admittedly when the 
storage was not stressed. Might try again doing iperf, but I guess results 
will be total bandwidth minus what DRBD uses, which I cannot really tell.

B.

------------------------------

Message: 7
Date: Wed, 4 Aug 2010 14:10:39 -0500
From: Ben Beuchler <insyte at gmail.com>
Subject: Re: [DRBD-user] Two newbie questions
To: drbd-user at lists.linbit.com
Message-ID:
	<AANLkTikPuY02KPqLkdRLakA7T-Tv5akatdy_g5FAhse4 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

> system can block-write. I'm interested in why the active-backup
> recommendation, and other folks' experiences with bonding.

My recollection from Florian Haas' presentation:

Bonding works well when you have UDP traffic or multiple TCP streams.
However when you push a single TCP stream across a bonded ethernet
link, the remote end gets bogged down handling the out-of-order
packets and your overall throughput can actually *decrease*.  My notes
indicate that their experimenting showed that adding a second bonded
ethernet link resulted in about a 60% increase in throughput.  Adding
a third link dropped throughput to the same as a single link.

> (2) System A (primary) and system B (secondary). If B is shut down, A
> maintains a list of out-of-sync blocks for B. Where is this kept?

In the system A's metadata.

> If in the
> metadata (internal), how often is it updated on disk?

With each write.  Marking the block "dirty" is completed before the
write() call is allowed to return.

> If A is shut down and rebooted before B comes alive, any chance of losing any updates?

Not under normal circumstances.  I'm sure someone will promptly
respond with a scenario that could cause data loss...

-Ben

------------------------------

Message: 8
Date: Wed, 4 Aug 2010 21:28:24 +0200
From: Bart Coninckx <bart.coninckx at telenet.be>
Subject: Re: [DRBD-user] Two newbie questions
To: drbd-user at lists.linbit.com
Message-ID: <201008042128.24098.bart.coninckx at telenet.be>
Content-Type: Text/Plain;  charset="iso-8859-1"

On Wednesday 04 August 2010 21:10:39 Ben Beuchler wrote:
> > system can block-write. I'm interested in why the active-backup
> > recommendation, and other folks' experiences with bonding.
> 
> My recollection from Florian Haas' presentation:
> 
> Bonding works well when you have UDP traffic or multiple TCP streams.
> However when you push a single TCP stream across a bonded ethernet
> link, the remote end gets bogged down handling the out-of-order
> packets and your overall throughput can actually *decrease*.  My notes
> indicate that their experimenting showed that adding a second bonded
> ethernet link resulted in about a 60% increase in throughput.  Adding
> a third link dropped throughput to the same as a single link.

Indeed, I had this problem and fixed it by changing tcp_reordening or 
something in /proc/something

------------------------------

Message: 9
Date: Thu, 5 Aug 2010 13:12:13 +0530
From: Rohit Upadhyay <vivacious at sify.com>
Subject: Re: [DRBD-user] existing volume
To: drbd-user at lists.linbit.com, Mike Lovell <mike at dev-zero.net>
Message-ID:
	<AANLkTi=dNYyE5gFknB0K-MZnozqv-JSvFDRqk2qhBXVu at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

What is the advantage of DRBD's own protocol over iSCSI?
Given the availability of iSCSI initiator & target, what was the reason to
go to the length of developing own protocol?

On Wed, Jul 14, 2010 at 9:15 PM, Mike Lovell <mike at dev-zero.net> wrote:

>  Rohit Upadhyay wrote:
>
> Thanks Mike for answering. Another query:
>
> How is transport between 2 machines handled? What protocol is used in place
> of iSCSI / NBD?
> Can it work without DRBD installed on other side?
>
> the transport is TCP. the protocol is its own protocol. it needs drbd
> installed and configured on the remote side.
>
> mike
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100805/7cf51914/attachment.html>

------------------------------

_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

End of drbd-user Digest, Vol 73, Issue 4
****************************************