Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear DRBD9 & drbdmanage users & testers, we have published version 0.99.8 of drbdmanage. If you use drbdmanage, you certainly want to upgrade. - Bug fix for too short timeout for TCP communication: Since a long time one leader node communicates with all other nodes in the cluster via TCP. A wrong timeout gave satellites only 2s time to finish their work (e.g., creating resource files, creating meta data, bringing the resource up). This was a bug and in busy clusters users saw "pending actions". - Change quorum tracking: Quorum tracking is for example used for leader election. Only if there is a majority of nodes, one of them becomes the leader. This was based on connect events on the control volume. If a node missed that event (e.g., the volume was already up), it considered other nodes as offline, even though everything was fine. Users saw that frequently when executing "drbdmanage nodes". Depending on the time a node started, this was even inconsistent within the cluster, as the local information is used to print that status. Now all (drbd) connected nodes are considered. That said, the output of "drbdmanage role" is the important one. - So far drbdmanage only relied on its own view of the world. For example if it thought a deployment is pending, it retried ad infinity and failed because in fact the deployment was already done. For deployment it now also considers the real world and checks if the drbd resource already exists and is healthy. In that case it considers the deployment successful. - Fix locking between leader and its satellites. By switching to TCP for communication and a threaded TCP server, we obviously introduced concurrency. Unfortunately the locking between local components as well as the cluster nodes in general was incomplete. A read on a satellite at the wrong point in time overwrote the local control volume (cluster DB), which under certain conditions then was sent back as the new cluster DB to the leader. With that fix it does not matter anymore if commands are executed on the leader or the satellite. This was tested with while-true-loops on satellites and the leader while another satellite created resources. For many of them ;). - Read actions (and therefore local updates of the cluster DB) potentially triggered actions on satellites. For example they created resources, then later got the order from the leader to create that same, existing resource and failed because it already existed. Now satellite nodes do not execute any actions without getting an order from the leader. More details can be found in the according git logs[2]. As usual, we provide tar-balls[1], a git repo[2], and an Ubuntu PPA[3]. If you have any questions, suggestions or feedback for us, feel free to post to the drbd-user mailing list. Best regards, rck [1] https://www.linbit.com/en/drbd-community/drbd-download/ [2] http://git.linbit.com/drbdmanage.git/ [3] https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack