[DRBD-user] drbd-reactor v0.5.0-rc.1 (including HA FS mount example)
roland.kammerer at linbit.com
Fri Nov 12 15:32:15 CET 2021
Dear DRBD(-reactor) users,
this is the first release candidate of version 0.5.0
Besides minor fixes for Ubuntu Bionic, and some upgrades for containers,
the main feature is proper demote failure handling in the promoter
There was the "on-stop-failure" action, which at one point worked, but
did not do anything since we switched to a more fancy systemd.target
logic. What we really care about when managing services (and a potential
fail-over because of a service failure) is if things are stopped in a
way that the DRBD device can be demoted to secondary. If not, we might
need to halt or reboot the node so that another node can take over the
DRBD resource and the services depending ot it.
This is done via the new setting "on-drbd-demote-failure".
"on-stop-failure" is deprecated and ignored. The new option can be set
to any action defined for "FailureAction" as defined in systemd.unit(5).
If the DRBD resource can not be demoted, that action is executed.
Let's see how that looks like in a HA cluster providing a file system
mount. I assume a working linstor cluster (while not strictly required).
A good way to help us testing is using the PPA. If you are using fresh
VMs, make sure that you restart multipathd after the first drbd-utils
install. So, let's assume a 3 node cluster, which also has this RC of
Let's create a 3 node DRBD resource:
$ linstor rg c --place-count 3 promoter
$ linstor rg drbd-options promoter --auto-promote no
$ linstor rg drbd-options promoter --quorum majority
$ linstor rg drbd-options promoter --on-no-quorum io-error
$ linstor vg c promoter
$ linstor rg spawn promoter test 20M
And a file system:
$ drbdadm primary test
$ mkfs.ext4 /dev/drbd1000
$ drbdadm secondary test
And a mount unit for the storage:
on *all* nodes:
$ cat <<EOF > /etc/systemd/system/mnt-test.mount
Description=Mount /dev/drbd1000 to /mnt/test
And a simple drbd-reactor::promoter config:
on *all* nodes:
$ cat <<EOF > /etc/drbd-reactor.d/mnt-test.toml
id = "mnt-test"
start = ["mnt-test.mount"]
on-drbd-demote-failure = "reboot"
on *all* nodes:
systemctl start drbd-reactor
Then you can check which node is Primary and has the device mounted:
$ drbd-reactorctl status mnt-test
On the node that is Primary you can do a switch-over, just for testing:
$ drbd-reactorctl disable --now mnt-test
$ # another node should be primary now and have the FS mounted
$ drbd-reactorctl enable mnt-test # to re-enable the config again
Testing demote failure. Connect to the node that is Primary
$ touch /mnt/test/lock
$ sleep 3600 < /mnt/test/lock &
$ # ^^ this creates an opener and the mount unit will be unable to stop
$ # and the DRBD device will be unable to demote
$ systemctl restart drbd-services at test.target # trigger a stop/restart of
This should trigger the reboot action and another node should take over
Please help testing.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 488 bytes
Desc: not available
More information about the drbd-user