[DRBD-user] drbd-reactor v0.9.0-rc.2

Roland Kammerer roland.kammerer at linbit.com
Fri Sep 9 11:07:49 CEST 2022


Dear DRBD users,

this is RC2 for drbd-reactor version 0.9.0. You did not miss RC1, I found
a stupid bug in RC1 when I rolled it out on one of our internal
clusters...

The main new feature is that the promoter plugin can now freeze the
services of the currently active node when it loses quorum and then thaw
them when the node gains quorum again. This might be an advantage when
starting services takes a long time (e.g., huge databases). Freezing and
thawing is instant and uses the according cgroup features via systemctl
freeze/thaw.

Copying from the documentation [1]:
The default behavior when a DRBD Primary looses quorum is to immediately
stop the generated target unit and hope that other nodes still having
quorum will successfully start the service. This works well if services
can be failed over/started on another node in reasonable time.
Unfortunately there are services that take a very long time to start,
for example huge data bases.

When a DRBD Primary looses its quorum we basically have two possibilities:

- the rest of the nodes, or at least parts of it still have quorum: Then
  these have to start the service, they are the only ones with quorum,
  but still we could keep the old Primary in a frozen state. And then,
  when the nodes with quorum come into contact with the old Primary,
  then it should stop the service and its storage should become in sync
  with the other nodes.
- the rest of the nodes are not able to form a partition with quorum. In
  such a scenario there are no alternatives anyways, we would need to
  keep the Primary frozen. But if the nodes eventually join the old
  Primary again, and quorum would be restored, we could just
  unfreeze/thaw the old Primary (which is also the new Primary).

There are several requirements for this to work properly:

- A system with unified cgroups. If the file
  /sys/fs/cgroup/cgroup.controllers exists you should be fine. That
  requires a relatively "new" kernel. Note that "even" RHEL8 for example
  needs the addition of systemd.unified_cgroup_hierarchy on the kernel
  command line.
- a service that can tolerate to be frozen
- DRBD option on-suspended-primary-outdated set to force-secondary
- DRBD option on-quorum-loss set to suspend-io
- DRBD net option rr-conflict set to retry-connect

If these requirements are fulfilled, then one can set the promoter
option "on-quorum-loss" to "freeze".

Consider this as an experimental feature, and most users probably should
not enable that by default. It is a feature that might be handy in
specific situations, the more classic behavior of stopping the services
might be the better default for most users.

Also, and that is important in general, check the output of "systemctl
status drbd-reactor.service", it runs all kinds of DRBD option checks on
your DRBD resources and tells you which options are missing/wrong.
Follow these suggestions!

Regards, rck

GIT: https://github.com/LINBIT/drbd-reactor/commit/a5cf59bf84359f0095154e8db78489c627f968ad
TGZ: https://pkg.linbit.com//downloads/drbd/utils/drbd-reactor-0.9.0-rc.2.tar.gz
PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack

Changelog:
[ Roland Kammerer ]
* doc: make man pages o+r
* docs,promoter: hint to use provided packages
* promoter: warn if mount unit is topmost unit
* promoter: implement on-quorum-loss policy
* promoter: relax ocf parser
* ctl: add resource filter
* ctl: fix status without res filter

[1] https://github.com/LINBIT/drbd-reactor/blob/master/doc/promoter.md#freezing-resources
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20220909/ac8fc93f/attachment.pgp>


More information about the drbd-user mailing list