[DRBD-user] linstor-proxmox hangs forever because of tainted kernel
robert.altnoeder at linbit.com
Wed May 29 10:26:26 CEST 2019
On 5/28/19 6:16 PM, Alexander Karamanlidis wrote:
> hangs forever because of tainted kernel
Those hangs have nothing to do with the taint status that the kernel
shows, since none of the problem-related taint flags are set.
The kernel shows a taint of P O, which is
- P: Proprietary module loaded
- O: Out-of-tree module loaded
That is a normal runtime status that does not indicate any problems.
What's more interesting are the messages emitted by LINSTOR:
> Suspended IO of 'vm-102-disk-1' on 'node2' for snapshot
> Suspended IO of 'vm-102-disk-1' on 'node1' for snapshot
> (Node: 'node1') Preparing resources for layer StorageLayer failed
> External command timed out
> External command: lvs -o
> --separator ; --noheadings --units k --nosuffix drbdpool
> VM 102 qmp command 'savevm-end' failed - unable to connect to VM 102
> qmp socket - timeout after 5992 retries
> snapshot create failed: starting cleanup
> error with cfs lock 'storage-drbdpool': Could not remove
> vm-102-state-test123: got lock timeout - aborting command
> TASK ERROR: Could not create cluster wide snapshot for: vm-102-disk-1:
> exit code 10
Looks like LVM, or some subtask of it, is accessing the storage of
vm-102-disk-1 through DRBD (maybe LVM scanning DRBD devices), which will
hang, because I/O on that device is suspended in order to take a
cluster-wide consistent snapshot.
My guess is that this is an LVM configuration error that causes LVM to
access DRBD devices, a very common source of timeout problems of all kinds.
> We also have LVM_THIN Storage Pools.
Those also block whenever they run full, so checking that may be a good
More information about the drbd-user