robert.altnoeder at linbit.com
Fri Jul 27 15:30:13 CEST 2018
On 07/27/2018 02:41 PM, Yannis Milios wrote:
> - 3 nodes in the cluster(A,B,C), all configured as 'Combined' nodes,
> nodeC acts as a controller.
Satellite and Controller are quite obvious, Combined is a node that runs
a Satellite and may sometimes run a Controller, Auxiliary is a node that
runs neither but is registered for other reasons, this is mostly
reserved for future features.
> Let's assume that nodeA fails and it will not come up any soon, so I
> want to remove it from the cluster.To accomplish that I use
> "linstor node delete <NodeA>" . The problem is that the node (which
> appears as OFFLINE) it never gets deleted from the cluster. Obviously
> the controller, is awaiting for the dead node's confirmation and
> refuses to remove its entry if it doesn't. Is there any way to force
> remove the dead node from the database ?
> Same applies when deleting a RD,R,VD from the same node. In DM there
> was a (-f) force option, which was useful in such situations.
There is a NodeLost API and a corresponding command for it. There are no
force options otherwise, instead, it is expected that a system
administrator will clean up a resource manually if automatic cleanup
does not work, and as soon as LINSTOR detects that the resource has been
cleaned up properly, it will disappear from LINSTOR's database if the
resource was marked for deletion.
In the current version, there are still a few situations where this does
not work, e.g. if an entire storage pool is lost (because if the entire
storage pool does not work, LINSTOR can not process resource deletion on
it). Commands for losing storage will be added, as well as dealing
correctly with certain situations like a non-existent volume group.
There are however no plans to add any force flags like in drbdmanage to
resource management (or similar) commands, because that frequently
caused massive desyncs of drbdmanage's state and the real state of
backend storage resource, as it was frequently misused by
administrators, who also often expected the various "force" options to
do something completely different than they actually did.
> - Is there any option to wipe all cluster information, similar to
> "drbdmanage uninit" in order to start from scratch? Purging all
> linstor packages does not seem to reset this information.
Deleting the database will cause LINSTOR to initialize a new database.
The database could be anywhere depending on how LINSTOR was installed,
where it currently is can be found out by looking at the connection-url
setting in controller's database.cfg file.
> - If nodeC (controller) dies, then logically must decide which of the
> surviving nodes will replace it, let's say nodeB is selected as
> controller node. After starting linstor-controller service on nodeB
> and giving "linstor n l" , there are no nodes cluster nodes in the
> list. Does this mean we have to re-create the cluster from scratch
> (guess no) or there's a way to import the config from the dead nodeC?
This is supposed to be managed by a cluster resource manager like pacemaker.
Obviously, in a multi-controller HA environment, the controller database
must be available on all nodes, and there are various possibilities to
ensure it is:
- Connect the LINSTOR controller to a centralized database cluster
reachable by all potential controllers
- Put the LINSTOR integrated database on a replicated storage volume,
such as a DRBD volume
- Connect the LINSTOR controller to a local external database and use
database replication to keep the other potential controllers up to date
- Put the LINSTOR integrated database on an NFS server
Automatic failover requires the usual cluster magic to make sure node
failures are detected and split brains are avoided (e.g., independent
cluster links, resource- and node-level fencing).
I'll leave answering the package-related questions to our packaging experts.
More information about the drbd-user