[DRBD-announce] drbd-9.1.6-rc.1

Philipp Reisner philipp.reisner at linbit.com
Mon Feb 7 08:11:47 CET 2022


In this cycle, we learned that DRBD was not dealing correctly with internal
meta-data on backing devices larger than 128TB. Why did we not discover this
earlier? (we pretend that DRBD can hand up to 1PB per volume/device) Well,
we tested with device-mapper-based thinly provisioned LVs. It turns out that
device-mapper considers 0-byte sized write requests valid. The bug was that
DRBD issued writes with a length of 0 bytes. The first customer that used a
SCSI device larger than 128TB then made us aware of the problem.

Considerable effort went into recovering IO and unfreezing requests after
and IO frozen DRBD unfreezes.

Another surprise was that DRBD caused unnecessary allocations in thinly
provisioned backing devices when the resync was throttled. We took the
chance to rework this part of the code. Now it will never cause such
unnecessary allocations. The real resync rate is now closer to the requested
c-max-rate setting than before.

A drawback of the new implementation is that with high discard-granularities
(e.g. 512KB) the slowest it can go is 5MB/sec. I believe that is just an
academic problem, in the real world people want to resync a lot quicker, and
real devices with such high discard granularity are rare.

At the moment, this is a release candidate. Please give it a try.

9.1.6-rc.1 (api:genl2/proto:110-121/transport:17)
 * fix IO to internal meta-data for backing device larger than 128TB
 * fix resending requests towards diskless peers, this is relevant when
   fencing is enabled, but the connection is re-established before fencing
   succeeds; when the bug triggered it lead to "stuck" requests
 * remove lockless buffer pages handling; it still contained very hard to
   trigger bugs
 * make sure DRBD's resync does not cause unnecessary allocation in a
   thinly provisioned backing device on a resync target node
 * avoid unnecessary resync (or split-brain) due to a wrongly generated
   new current UUID when an already IO frozen DBRD gets new writes
 * small fix to autopromote, when an application tries a read-only open
   before it does a read-write open immediately after the peer primary
   vanished ungracefully
 * split out the secure boot key into a package on its own, that is
   necessary to allow installation of multiple drbd kernel module packages
 * Support for building containers for flacar linux


More information about the drbd-announce mailing list