[DRBD-user] drbdmanage 0.99.8

Roland Kammerer roland.kammerer at linbit.com
Wed Aug 9 14:20:13 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Dear DRBD9 & drbdmanage users & testers,

we have published version 0.99.8 of drbdmanage.

If you use drbdmanage, you certainly want to upgrade.

- Bug fix for too short timeout for TCP communication: Since a long time
  one leader node communicates with all other nodes in the cluster via
  TCP. A wrong timeout gave satellites only 2s time to finish their work
  (e.g., creating resource files, creating meta data, bringing the
  resource up). This was a bug and in busy clusters users saw "pending
  actions".

- Change quorum tracking: Quorum tracking is for example used for
  leader election. Only if there is a majority of nodes, one of them
  becomes the leader. This was based on connect events on the control
  volume. If a node missed that event (e.g., the volume was already up),
  it considered other nodes as offline, even though everything was fine.
  Users saw that frequently when executing "drbdmanage nodes". Depending
  on the time a node started, this was even inconsistent within the
  cluster, as the local information is used to print that status. Now
  all (drbd) connected nodes are considered. That said, the output of
  "drbdmanage role" is the important one.

- So far drbdmanage only relied on its own view of the world. For
  example if it thought a deployment is pending, it retried ad infinity
  and failed because in fact the deployment was already done. For
  deployment it now also considers the real world and checks if the
  drbd resource already exists and is healthy. In that case it
  considers the deployment successful.

- Fix locking between leader and its satellites. By switching to TCP for
  communication and a threaded TCP server, we obviously introduced
  concurrency. Unfortunately the locking between local components as
  well as the cluster nodes in general was incomplete. A read on a
  satellite at the wrong point in time overwrote the local control
  volume (cluster DB), which under certain conditions then was sent back
  as the new cluster DB to the leader. With that fix it does not matter
  anymore if commands are executed on the leader or the satellite. This
  was tested with while-true-loops on satellites and the leader while
  another satellite created resources. For many of them ;).

- Read actions (and therefore local updates of the cluster DB)
  potentially triggered actions on satellites. For example they created
  resources, then later got the order from the leader to create that
  same, existing resource and failed because it already existed. Now
  satellite nodes do not execute any actions without getting an order
  from the leader.

More details can be found in the according git logs[2].

As usual, we provide tar-balls[1], a git repo[2], and an Ubuntu PPA[3].

If you have any questions, suggestions or feedback for us, feel free to
post to the drbd-user mailing list.

Best regards, rck

[1] https://www.linbit.com/en/drbd-community/drbd-download/
[2] http://git.linbit.com/drbdmanage.git/
[3] https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack



More information about the drbd-user mailing list