[DRBD-user] barrier mode on LVM containers

Lars Ellenberg lars.ellenberg at linbit.com
Wed Aug 4 20:00:01 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

On Wed, Aug 04, 2010 at 07:30:42PM +0200, Sebastian Hetze wrote:
> Hi *,

BTW, you need to subscribe (or use your subscribed address) to post here.

> although the manual page for drbd.conf says that "DRBD will use the
> first method that is supported by the backing storage" and
> "Unfortunately device mapper (LVM) does not support barriers."

That now reads "might not support barriers".

device mapper linear targets supported barriers for some time now,
if they had exactly one table entry, so extending thus likely
fragmenting the mapping, or adding snapshots, would break it support.

device mapper targets in recent kernels do support barriers to a much
higher degree. In general, linux mainline aims to support barriers
throughout the stack.

> we find that barriers is the default setting for DRBD on top of LVM
> containers with version 8.3.7 (api:88/proto:86-92) srcversion:
> 582E47DEE6FD9EC45926ECF from linux (and probably other
> versions as well)

This has not much to do with the drbd version.  It just depends on
whether or not the lower level device supports barriers,
and how costly barriers or flushes are on your IO stack.

> With protocol B this can lead to an situation where the secondary node
> becomes completely unusable. It looks like the secondary sends all IO
> requests to the LVM layer and LVM can not manage the queue after a
> certain point.

Too bad.

> I would expect DRBD to use the flush method on LVM containers as
> default. At least if protocol B is used.

With kernels >= 2.6.24, a "flush" is implemented as "empty barrier",
so if there is no barrier support, there will be no flush support
either (except for maybe very few special cases).

> To demonstrate this behaviour, I suggest to set up a system with 10 or
> more DRBD resources (using protocol B) on LVM containers and configure
> syslog.conf such that it writes local messages into each of these
> resources (with sync). Given that the DRBD resources are mounted on
> /srv/drbd01, /srv/drbd02, ...  the syslog.conf would read: 
> ...
> local1.notice		/srv/drbd01/notice
> local2.info		/srv/drbd01/info
> local1.notice		/srv/drbd02/notice
> local2.info		/srv/drbd02/info
> and so on...
> Now use logger to write to all resources simultaniously:
> time {
> for loop in 1 2; do
> for i in `seq -w 100`; do
>         logger -p local1.notice -t logger "notice number $loop $i"
>         logger -p local2.info -t logger "info number $loop $i"
>         echo -n .
> done
> echo $loop
> done
> }
> These are only 400 small messages for each DRBD resource. On the local
> file system the whole thing finishes in less than 5 seconds.

Because it is not using barriers.

> In my test setup with 10 DRBD resources the logger loop takes arround
> 50 seconds to finish on the primary. While the primary is working with
> load below 1, the secondary load raises up to 10 and stays there for a
> couple of minutes. With only 10 resources the secondary recovers after
> a while.
> If you try the same simple test with 30 or more DRBD resources the
> secondary will get a load of 40 and wont recover, at least not within
> an hour.


If they are hurting you, disable barriers, then.

> With flush or protocol C it takes a couple of minutes to finish syncing
> these 400 messages per resource and the secondary remains usable.
> Why this must take sooo long is an other question...

: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
please don't Cc me, but send to list   --   I'm subscribed

More information about the drbd-user mailing list