[DRBD-user] Dealing with node failure

Merijn Visser - Uw Online ICT merijn at uwonlineict.nl
Tue Jan 5 23:22:30 CET 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

We have a 3 node cluster setup with drbd9. Drbd9 really is very nice!
One node crashed, we have to bring this node back in the cluster. We
cannot get this done.
We reinstalled it on the same way as the others, following this setup
https://pve.proxmox.com/wiki/DRBD9. Now we want te remove the failed
node from the cluster on the primary node. Unfortunately we cannot
delete an offline node from the cluster. It will give these messages
"drbd .drbdctrl: Auto-promote failed: Multiple primaries not allowed by
config"

We think it would be the most easy way if we just remove the node from
the cluster and add it again, so that all configuration and existing
volumes will get synchronised automatically.

Here are some details:
drbdmanage list-nodes
+----------------------------------------------------------------------+
| Name  | Pool Size | Pool Free | Site |                     |   State |
|----------------------------------------------------------------------|
| hyp10 |  16777216 |         0 |  N/A |                     |      ok |
| hyp20 |  16777216 |  16521161 |  N/A |                     | OFFLINE |
| hyp30 |  16777216 |   9690519 |  N/A |                     |      ok |
+----------------------------------------------------------------------+

drbdmanage remove-node -f hyp20
You are going to remove the node 'hyp20' from the cluster. This will
remove all resources from the node.
Please confirm:
  yes/no: yes
Jan  5 23:09:43 hyp30 kernel: [717721.879728] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Jan  5 23:09:48 hyp30 kernel: [717726.880759] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Jan  5 23:09:52 hyp30 kernel: [717731.278239] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Jan  5 23:09:56 hyp30 kernel: [717735.163982] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Jan  5 23:10:00 hyp30 kernel: [717738.849674] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Jan  5 23:10:05 hyp30 kernel: [717743.614836] drbd .drbdctrl:
Auto-promote failed: Multiple primaries not allowed by config
Traceback (most recent call last):
  File "/usr/bin/drbdmanage", line 30, in <module>
    drbdmanage_client.main()
  File "/usr/lib/python2.7/dist-packages/drbdmanage_client.py", line
3520, in main
    client.run()
  File "/usr/lib/python2.7/dist-packages/drbdmanage_client.py", line
1130, in run
    self.parse(sys.argv[1:])
  File "/usr/lib/python2.7/dist-packages/drbdmanage_client.py", line
991, in parse
    args.func(args)
  File "/usr/lib/python2.7/dist-packages/drbdmanage_client.py", line
1301, in cmd_remove_node
    dbus.String(node_name), dbus.Boolean(force)
  File "/usr/lib/python2.7/dist-packages/dbus/proxies.py", line 70, in
__call__
    return self._proxy_method(*args, **keywords)
  File "/usr/lib/python2.7/dist-packages/dbus/proxies.py", line 145, in
__call__
    **keywords)
  File "/usr/lib/python2.7/dist-packages/dbus/connection.py", line 651,
in call_blocking
    message, timeout)
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did
not receive a reply. Possible causes include: the remote application did
not send a reply, the message bus security policy blocked the reply, the
reply timeout expired, or the network connection was broken.

We also followed these guides
http://drbd.linbit.com/users-guide-9.0/s-node-failure.html#s-perm-node-failure,
but without any success. How can we connect the resource .drbdctrl?

Anybode some advice how to restore this cluster?

Kind regards,

Merijn



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160105/ebddb1a7/attachment.htm>


More information about the drbd-user mailing list