[DRBD-user] DRBD9 with PVE4.4, failure to restore VMs

Den drbdsys at made.net
Sat Mar 25 11:02:41 CET 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,
I'm testing a 3 node cluster, fresh installed with linbit repo for pve 
with last versions

I have configured LVM as backend storage and set redundancy to 1 for now.

So when I create a VM on node1, for example, could happen that the resource
is allocated on node2 and i get a diskless resource on node1, but this 
not seem to be a problem.

The problem is restoring adn creating VMs, here some log:




1) restoring a VM on pve215


Output from proxmox:
restore vma archive: lzop -d -c 
/mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma 
extract -v -r /var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783
CFG: size: 281 name: qemu-server.conf
DEV: dev_id=1 size: 2147483648 devname: drive-virtio0
CTIME: Thu Mar 23 17:58:11 2017
new volume ID is 'drbd1:vm-101-disk-1'
map 'drive-virtio0' to '/dev/drbd/by-res/vm-101-disk-1/0' (write zeros = 1)

** (process:6786): ERROR **: can't open file 
/dev/drbd/by-res/vm-101-disk-1/0 - Could not open 
'/dev/drbd/by-res/vm-101-disk-1/0': No such file or directory
/bin/bash: line 1:  6785 Broken pipe             lzop -d -c 
/mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo
       6786 Trace/breakpoint trap   | vma extract -v -r 
/var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783
temporary volume 'drbd1:vm-101-disk-1' sucessfuly removed
TASK ERROR: command 'lzop -d -c 
/mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma 
extract -v -r /var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783' 
failed: exit code 133



dmesg logs on pve215
[Fri Mar 24 17:33:53 2017] traps: vma[6786] trap int3 ip:7fea7dc35d30 
sp:7ffe12b61f70 error:0
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1: Starting worker thread 
(from drbdsetup [6830])
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: disk( Diskless 
-> Attaching )
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: Maximum number 
of peer devices = 7
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1: Method to ensure write 
ordering: flush
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: my node_id: 0
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: my node_id: 0
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: drbd_bm_resize 
called with capacity == 4194304
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: resync bitmap: 
bits=524288 words=57344 pages=112
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: recounting of 
set bits took additional 0ms
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: disk( Attaching 
-> UpToDate )
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: attached to 
current UUID: 626E64181A1CD438
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: size = 2048 MB 
(2097152 KB)
[Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: disk( UpToDate 
-> Detaching )
[Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: disk( Detaching 
-> Diskless )
[Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: drbd_bm_resize 
called with capacity == 0
[Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1: Terminating worker thread


pve216
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Preparing remote 
state change 4149564885 (primary_nodes=0, weak_nodes=0)
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Committing remote 
state change 4149564885
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Connected -> 
TearDown ) peer( Secondary -> Unknown )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1/0 drbd108 pve214: pdsk( 
UpToDate -> DUnknown ) repl( Established -> Off )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: ack_receiver 
terminated
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating 
ack_recv thread
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Connection closed
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( TearDown -> 
Unconnected )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Restarting 
receiver thread
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Unconnected 
-> Connecting )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Connecting 
-> Disconnecting )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Connection closed
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( 
Disconnecting -> StandAlone )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating 
receiver thread
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating sender 
thread
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1/0 drbd108: drbd_bm_resize 
called with capacity == 0
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1: Terminating worker thread


pve214
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: Preparing cluster-wide 
state change 4149564885 (0->1 496/16)
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: State change 4149564885: 
primary_nodes=0, weak_nodes=0
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Cluster is now split
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: Committing cluster-wide 
state change 4149564885 (0ms)
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: conn( Connected -> 
Disconnecting ) peer( Secondary -> Unknown )
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1/0 drbd108 pve216: pdsk( 
Diskless -> DUnknown ) repl( Established -> Off )
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: ack_receiver 
terminated
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating 
ack_recv thread
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Connection closed
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: conn( 
Disconnecting -> StandAlone )
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating 
receiver thread
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating sender 
thread



2) restoring a VM on pve215

restore vma archive: lzop -d -c 
/mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma 
extract -v -r /var/tmp/vzdumptmp8299.fifo - /var/tmp/vzdumptmp8299
CFG: size: 281 name: qemu-server.conf
DEV: dev_id=1 size: 2147483648 devname: drive-virtio0
CTIME: Thu Mar 23 17:58:11 2017
trying to aquire cfs lock 'storage-drbd1' ...TASK ERROR: command 'lzop 
-d -c 
/mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma 
extract -v -r /var/tmp/vzdumptmp8299.fifo - /var/tmp/vzdumptmp8299' 
failed: got lock request timeout



pve215
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1: Starting worker thread 
(from drbdsetup [8064])
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: disk( Diskless 
-> Attaching )
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: Maximum number 
of peer devices = 7
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1: Method to ensure write 
ordering: flush
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: my node_id: 0
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: my node_id: 0
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: drbd_bm_resize 
called with capacity == 4194304
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: resync bitmap: 
bits=524288 words=57344 pages=112
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: recounting of 
set bits took additional 0ms
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: disk( Attaching 
-> UpToDate )
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: attached to 
current UUID: F1BF2127385E673F
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: size = 2048 MB 
(2097152 KB)


pve214
no log

pve216
no log


3)
If I try to create a new VM, often happens that drbdmanage miss something:

root at pve214:~# drbdmanage list-assignments
+-----------------------------------------------------------------------------------------------------------+
| Node   | Resource      | Vol ID |                                    | 
State |
|------------------------------------------------------------------------------------------------------------|
| pve216 | vm-102-disk-1 |      * |                                    
|         pending actions: commission |
| pve216 | vm-102-disk-1 |      0 |                                    | 
pending actions: commission, attach |

but on pve216...
root at pve216:~# drbdadm status
.drbdctrl role:Secondary
   volume:0 disk:UpToDate
   volume:1 disk:UpToDate
   pve214 role:Primary
     volume:0 peer-disk:UpToDate
     volume:1 peer-disk:UpToDate
   pve215 role:Secondary
     volume:0 peer-disk:UpToDat[Sat Mar 25 10:44:47 2017] drbd 
vm-102-disk-1: Starting worker thread (from drbdsetup [31664])
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless 
-> Attaching )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number 
of peer devices = 7
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write 
ordering: flush
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: drbd_bm_resize 
called with capacity == 2097152
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap: 
bits=262144 words=28672 pages=56
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of 
set bits took additional 0ms
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching 
-> UpToDate )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to 
current UUID: D38CF07DC8601CD9
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB 
(1048576 KB)
eI have also a PVE 4.3 cluster with drbdmanage 0.97. Sometime restore 
hangs but there are
     volume:1 peer-disk:UpToDate

vm-102-disk-1 role:Secondary
   disk:UpToDate


dmesg on pve216
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Starting worker thread 
(from drbdsetup [31664])
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless 
-> Attaching )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number 
of peer devices = 7
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write 
ordering: flush
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drb[Sat Mar 25 10:44:47 2017] drbd 
vm-102-disk-1: Starting worker thread (from drbdsetup [31664])
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless 
-> Attaching )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number 
of peer devices = 7
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write 
ordering: flush
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: drbd_bm_resize 
called with capacity == 2097152
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap: 
bits=262144 words=28672 pages=56
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of 
set bits took additional 0ms
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching 
-> UpToDate )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to 
current UUID: D38CF07DC8601CD9
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB 
(1048576 KB)
d vm-102-disk-1/0 drbd120: drbd_bm_resize called with capacity == 2097152
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap: 
bits=262144 words=28672 pages=56
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of 
set bits took additional 0ms
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching 
-> UpToDate )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to 
current UUID: D38CF07DC8601CD9
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB 
(1048576 KB)


Then I made ad assign on pve214, unassign on pve216 (drbdmanage removed 
the lv too) and worked



Thank you and hope this could be useful for developers

Den




More information about the drbd-user mailing list