[DRBD-user] drbd-9.0.15

Tue Aug 14 17:05:49 CEST 2018

On 2018-08-14 08:36 AM, Philipp Reisner wrote:
> Hi,
> 
> This is an upgrade ever drbd-9 user should follow. It has two important
> fixes in the areas of 
> 
> * handling IO errors reported by the backing device
> 
>   handling of IO errors on the backend was completely broken since 
>   drbd-9.0 including the recovery options like replacing a failed
>   disk. When the disk was replaced even worse it was possible that
>   DRBD would read from the new disk before the full sync finished.
>   All fixed now, but very embarrassing.
> 
> * correctly handle UUIDs in case of live-migration
> 
>   That was the root cause for various strange behavior. E.g. a node
>   considering some other as not up-to-date while the peer considers
>   itself as up-to-date.
> 
> The goody of the release is that the submit code path was optimized
> a bit, and that gives up to 30% increase (depending on CPU model and
> performance of the backing device) in IOPs.
> 
> A lot of effort was spend to write more tests for the drbd9 test suite
> ( https://github.com/LINBIT/drbd9-tests ). DRBD-8.4 had its own, which
> was more complete at its time, but now it is overdue to have a testing
> coverage at least as good for the drbd-9 code base.
> 
> Apart form wok on the testsuite we will continue to put effort into
> optimizing the IO submit code path. Very fast NVMe devices keep the
> pressure on us to be able to fully utilize them when used as backing
> device for DRBD.
> 
> Note: We will update the PPA on Thursday (Aug 16). Sorry for the delay
>       (vacations and a bank holiday are the reasons)
> 
> 9.0.15 (api:genl2/proto:86-114/transport:14)
> --------
>  * fix tracking of changes (on a secondary) against the lost disk of a
>    primary and also fix re-attaching in case the disk is replaced (has
>    new meta-data)
>  * fix live migrate of VMs on DRBD when migrated to/from diskless
>    nodes; before that fix a race condition can lead to one of the nodes
>    seeing the other one as consistent only
>  * fix an IO deadlock in DRBD when the activity log on a secondary runs full;
>    In the real world, this was very seldom triggered but can be easily
>    reproduced with a workload that touches one block every 4M and writes
>    them all in a burst
>  * fix hanging demote after IO error followed by attaching the disk again
>    and the corresponding resync
>  * fix DRBD dropping connection after an IO error on the secondary node
>  * new module parameter to disable support for older protocol versions,
>    an in case you configured peers that are not expected to connect it
>    might have positive effects because then this node does not need to
>    assume that such peer is ancient
>  * improve details when online changing devices from diskless to with disk and
>    vice versa. (Including peers freeing bitmap slots)
>  * remove no longer relevant compat tests
>  * expose openers via debugfs; that helps to answer the question why does
>    DRBD not demote to secondary, why does it give tell me "Device is held
>    open by someone"
>  * optimize IO submit code path; this can improve IOPs up to 30% on a system
>    with fast backend storage; lowers CPU load caused by DRBD on every workload
>  * compat for v4.18 kernel
> 
> http://www.linbit.com/downloads/drbd/9.0/drbd-9.0.15-1.tar.gz
> https://github.com/LINBIT/drbd-9.0/releases/tag/drbd-9.0.15
> 
> best regards,
>  Phil

Congrats on the release!

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould