Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I'm testing a 3 node cluster, fresh installed with linbit repo for pve with last versions I have configured LVM as backend storage and set redundancy to 1 for now. So when I create a VM on node1, for example, could happen that the resource is allocated on node2 and i get a diskless resource on node1, but this not seem to be a problem. The problem is restoring adn creating VMs, here some log: 1) restoring a VM on pve215 Output from proxmox: restore vma archive: lzop -d -c /mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783 CFG: size: 281 name: qemu-server.conf DEV: dev_id=1 size: 2147483648 devname: drive-virtio0 CTIME: Thu Mar 23 17:58:11 2017 new volume ID is 'drbd1:vm-101-disk-1' map 'drive-virtio0' to '/dev/drbd/by-res/vm-101-disk-1/0' (write zeros = 1) ** (process:6786): ERROR **: can't open file /dev/drbd/by-res/vm-101-disk-1/0 - Could not open '/dev/drbd/by-res/vm-101-disk-1/0': No such file or directory /bin/bash: line 1: 6785 Broken pipe lzop -d -c /mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo 6786 Trace/breakpoint trap | vma extract -v -r /var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783 temporary volume 'drbd1:vm-101-disk-1' sucessfuly removed TASK ERROR: command 'lzop -d -c /mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783' failed: exit code 133 dmesg logs on pve215 [Fri Mar 24 17:33:53 2017] traps: vma[6786] trap int3 ip:7fea7dc35d30 sp:7ffe12b61f70 error:0 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1: Starting worker thread (from drbdsetup [6830]) [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: disk( Diskless -> Attaching ) [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: Maximum number of peer devices = 7 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1: Method to ensure write ordering: flush [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: my node_id: 0 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: my node_id: 0 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: drbd_bm_resize called with capacity == 4194304 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: resync bitmap: bits=524288 words=57344 pages=112 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: recounting of set bits took additional 0ms [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: disk( Attaching -> UpToDate ) [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: attached to current UUID: 626E64181A1CD438 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: size = 2048 MB (2097152 KB) [Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: disk( UpToDate -> Detaching ) [Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: disk( Detaching -> Diskless ) [Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: drbd_bm_resize called with capacity == 0 [Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1: Terminating worker thread pve216 [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Preparing remote state change 4149564885 (primary_nodes=0, weak_nodes=0) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Committing remote state change 4149564885 [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1/0 drbd108 pve214: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: ack_receiver terminated [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating ack_recv thread [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Connection closed [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( TearDown -> Unconnected ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Restarting receiver thread [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Unconnected -> Connecting ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Connecting -> Disconnecting ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Connection closed [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Disconnecting -> StandAlone ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating receiver thread [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating sender thread [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1/0 drbd108: drbd_bm_resize called with capacity == 0 [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1: Terminating worker thread pve214 [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: Preparing cluster-wide state change 4149564885 (0->1 496/16) [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: State change 4149564885: primary_nodes=0, weak_nodes=0 [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Cluster is now split [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: Committing cluster-wide state change 4149564885 (0ms) [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1/0 drbd108 pve216: pdsk( Diskless -> DUnknown ) repl( Established -> Off ) [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: ack_receiver terminated [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating ack_recv thread [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Connection closed [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: conn( Disconnecting -> StandAlone ) [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating receiver thread [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating sender thread 2) restoring a VM on pve215 restore vma archive: lzop -d -c /mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp8299.fifo - /var/tmp/vzdumptmp8299 CFG: size: 281 name: qemu-server.conf DEV: dev_id=1 size: 2147483648 devname: drive-virtio0 CTIME: Thu Mar 23 17:58:11 2017 trying to aquire cfs lock 'storage-drbd1' ...TASK ERROR: command 'lzop -d -c /mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp8299.fifo - /var/tmp/vzdumptmp8299' failed: got lock request timeout pve215 [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1: Starting worker thread (from drbdsetup [8064]) [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: disk( Diskless -> Attaching ) [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: Maximum number of peer devices = 7 [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1: Method to ensure write ordering: flush [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: my node_id: 0 [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: my node_id: 0 [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: drbd_bm_resize called with capacity == 4194304 [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: resync bitmap: bits=524288 words=57344 pages=112 [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: recounting of set bits took additional 0ms [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: disk( Attaching -> UpToDate ) [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: attached to current UUID: F1BF2127385E673F [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: size = 2048 MB (2097152 KB) pve214 no log pve216 no log 3) If I try to create a new VM, often happens that drbdmanage miss something: root at pve214:~# drbdmanage list-assignments +-----------------------------------------------------------------------------------------------------------+ | Node | Resource | Vol ID | | State | |------------------------------------------------------------------------------------------------------------| | pve216 | vm-102-disk-1 | * | | pending actions: commission | | pve216 | vm-102-disk-1 | 0 | | pending actions: commission, attach | but on pve216... root at pve216:~# drbdadm status .drbdctrl role:Secondary volume:0 disk:UpToDate volume:1 disk:UpToDate pve214 role:Primary volume:0 peer-disk:UpToDate volume:1 peer-disk:UpToDate pve215 role:Secondary volume:0 peer-disk:UpToDat[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Starting worker thread (from drbdsetup [31664]) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless -> Attaching ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number of peer devices = 7 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write ordering: flush [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: drbd_bm_resize called with capacity == 2097152 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap: bits=262144 words=28672 pages=56 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of set bits took additional 0ms [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching -> UpToDate ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to current UUID: D38CF07DC8601CD9 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB (1048576 KB) eI have also a PVE 4.3 cluster with drbdmanage 0.97. Sometime restore hangs but there are volume:1 peer-disk:UpToDate vm-102-disk-1 role:Secondary disk:UpToDate dmesg on pve216 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Starting worker thread (from drbdsetup [31664]) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless -> Attaching ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number of peer devices = 7 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write ordering: flush [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0 [Sat Mar 25 10:44:47 2017] drb[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Starting worker thread (from drbdsetup [31664]) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless -> Attaching ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number of peer devices = 7 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write ordering: flush [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: drbd_bm_resize called with capacity == 2097152 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap: bits=262144 words=28672 pages=56 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of set bits took additional 0ms [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching -> UpToDate ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to current UUID: D38CF07DC8601CD9 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB (1048576 KB) d vm-102-disk-1/0 drbd120: drbd_bm_resize called with capacity == 2097152 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap: bits=262144 words=28672 pages=56 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of set bits took additional 0ms [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching -> UpToDate ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to current UUID: D38CF07DC8601CD9 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB (1048576 KB) Then I made ad assign on pve214, unassign on pve216 (drbdmanage removed the lv too) and worked Thank you and hope this could be useful for developers Den