[DRBD-user] barrier mode on LVM containers

Wed Aug 4 19:30:42 CEST 2010

Hi *,

although the manual page for drbd.conf says that "DRBD will use the
first method that is supported by the backing storage" and
"Unfortunately device mapper (LVM) does not support barriers." we find
that barriers is the default setting for DRBD on top of LVM containers
with version 8.3.7 (api:88/proto:86-92) srcversion:
582E47DEE6FD9EC45926ECF from linux 2.6.34.1 (and probably other versions
as well)

With protocol B this can lead to an situation where the secondary node
becomes completely unusable. It looks like the secondary sends all IO
requests to the LVM layer and LVM can not manage the queue after a
certain point.

I would expect DRBD to use the flush method on LVM containers as
default. At least if protocol B is used.

To demonstrate this behaviour, I suggest to set up a system with 10 or
more DRBD resources (using protocol B) on LVM containers and configure
syslog.conf such that it writes local messages into each of these
resources (with sync). Given that the DRBD resources are mounted on
/srv/drbd01, /srv/drbd02, ...  the syslog.conf would read: 

...
local1.notice		/srv/drbd01/notice
local2.info		/srv/drbd01/info
local1.notice		/srv/drbd02/notice
local2.info		/srv/drbd02/info
and so on...

Now use logger to write to all resources simultaniously:

time {
for loop in 1 2; do
for i in `seq -w 100`; do
        logger -p local1.notice -t logger "notice number $loop $i"
        logger -p local2.info -t logger "info number $loop $i"
        echo -n .
done
echo $loop
done
}

These are only 400 small messages for each DRBD resource. On the local
file system the whole thing finishes in less than 5 seconds.

In my test setup with 10 DRBD resources the logger loop takes arround
50 seconds to finish on the primary. While the primary is working with
load below 1, the secondary load raises up to 10 and stays there for a
couple of minutes. With only 10 resources the secondary recovers after
a while.
If you try the same simple test with 30 or more DRBD resources the
secondary will get a load of 40 and wont recover, at least not within
an hour.

With flush or protocol C it takes a couple of minutes to finish syncing
these 400 messages per resource and the secondary remains usable.
Why this must take sooo long is an other question...

Best regards,

  Sebastian