[DRBD-user] drbd-9.0.15

Philipp Reisner philipp.reisner at linbit.com
Tue Aug 14 14:36:53 CEST 2018


This is an upgrade ever drbd-9 user should follow. It has two important
fixes in the areas of 

* handling IO errors reported by the backing device

  handling of IO errors on the backend was completely broken since 
  drbd-9.0 including the recovery options like replacing a failed
  disk. When the disk was replaced even worse it was possible that
  DRBD would read from the new disk before the full sync finished.
  All fixed now, but very embarrassing.

* correctly handle UUIDs in case of live-migration

  That was the root cause for various strange behavior. E.g. a node
  considering some other as not up-to-date while the peer considers
  itself as up-to-date.

The goody of the release is that the submit code path was optimized
a bit, and that gives up to 30% increase (depending on CPU model and
performance of the backing device) in IOPs.

A lot of effort was spend to write more tests for the drbd9 test suite
( https://github.com/LINBIT/drbd9-tests ). DRBD-8.4 had its own, which
was more complete at its time, but now it is overdue to have a testing
coverage at least as good for the drbd-9 code base.

Apart form wok on the testsuite we will continue to put effort into
optimizing the IO submit code path. Very fast NVMe devices keep the
pressure on us to be able to fully utilize them when used as backing
device for DRBD.

Note: We will update the PPA on Thursday (Aug 16). Sorry for the delay
      (vacations and a bank holiday are the reasons)

9.0.15 (api:genl2/proto:86-114/transport:14)
 * fix tracking of changes (on a secondary) against the lost disk of a
   primary and also fix re-attaching in case the disk is replaced (has
   new meta-data)
 * fix live migrate of VMs on DRBD when migrated to/from diskless
   nodes; before that fix a race condition can lead to one of the nodes
   seeing the other one as consistent only
 * fix an IO deadlock in DRBD when the activity log on a secondary runs full;
   In the real world, this was very seldom triggered but can be easily
   reproduced with a workload that touches one block every 4M and writes
   them all in a burst
 * fix hanging demote after IO error followed by attaching the disk again
   and the corresponding resync
 * fix DRBD dropping connection after an IO error on the secondary node
 * new module parameter to disable support for older protocol versions,
   an in case you configured peers that are not expected to connect it
   might have positive effects because then this node does not need to
   assume that such peer is ancient
 * improve details when online changing devices from diskless to with disk and
   vice versa. (Including peers freeing bitmap slots)
 * remove no longer relevant compat tests
 * expose openers via debugfs; that helps to answer the question why does
   DRBD not demote to secondary, why does it give tell me "Device is held
   open by someone"
 * optimize IO submit code path; this can improve IOPs up to 30% on a system
   with fast backend storage; lowers CPU load caused by DRBD on every workload
 * compat for v4.18 kernel


best regards,
LINBIT | Keeping The Digital World Running

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

More information about the drbd-user mailing list