[DRBD-user] linstor-proxmox hangs forever because of tainted kernel

Robert Altnoeder robert.altnoeder at linbit.com
Wed May 29 10:26:26 CEST 2019


On 5/28/19 6:16 PM, Alexander Karamanlidis wrote:

> hangs forever because of tainted kernel

Those hangs have nothing to do with the taint status that the kernel
shows, since none of the problem-related taint flags are set.
The kernel shows a taint of P O, which is

- P: Proprietary module loaded
- O: Out-of-tree module loaded

That is a normal runtime status that does not indicate any problems.


What's more interesting are the messages emitted by LINSTOR:

> SUCCESS:
>
>     Suspended IO of 'vm-102-disk-1' on 'node2' for snapshot
> SUCCESS:
>     Suspended IO of 'vm-102-disk-1' on 'node1' for snapshot
>

> ERROR:
> Description:
>     (Node: 'node1') Preparing resources for layer StorageLayer failed
> Cause:
>     External command timed out
> Details:
>     External command: lvs -o
> lv_name,lv_path,lv_size,vg_name,pool_lv,data_percent,lv_attr
> --separator ; --noheadings --units k --nosuffix drbdpool
> VM 102 qmp command 'savevm-end' failed - unable to connect to VM 102
> qmp socket - timeout after 5992 retries
> snapshot create failed: starting cleanup
> error with cfs lock 'storage-drbdpool': Could not remove
> vm-102-state-test123: got lock timeout - aborting command
> TASK ERROR: Could not create cluster wide snapshot for: vm-102-disk-1:
> exit code 10
>

Looks like LVM, or some subtask of it, is accessing the storage of
vm-102-disk-1 through DRBD (maybe LVM scanning DRBD devices), which will
hang, because I/O on that device is suspended in order to take a
cluster-wide consistent snapshot.
My guess is that this is an LVM configuration error that causes LVM to
access DRBD devices, a very common source of timeout problems of all kinds.


> We also have LVM_THIN Storage Pools. 

Those also block whenever they run full, so checking that may be a good
idea too.

br,
Robert



More information about the drbd-user mailing list