[DRBD-user] 2 node clustersplit-brain on linstor_db
Martin
mlc42 at gmx.de
Tue Jun 8 22:44:57 CEST 2021
I´m trying to build a 2 node cluster with an extra qdevice to have 3
votes with proxmox and drbd.
node 1 1GB NIC 192.168.1.245 2.5GB NIC 192.168.3.1
node 1 1GB NIC 192.168.1.246 2.5GB NIC 192.168.3.2
After installing proxmox 6.4 i install drbd9/linstor.
#apt install linstor-controller linstor-satellite linstor-client
#systemctl start linstor-satellite
#systemctl enable linstor-satellite
#systemctl start linstor-controller
#systemctl enable linstor-satellite
#linstor node create proxmoxn1 192.168.3.1 --node-type Combined
#linstor node create proxmoxn2 192.168.3.2 --node-type Combined
/etc/linstor/linstor-client.conf
[global]
controllers=proxmoxn1,proxmoxn2
#create a partition with fdisk /dev/nvme0n1
#vgcreate vg_ssd /dev/nvme0n1p4
On First node
#linstor storage-pool create lvm proxmoxn1 pool_ssd vg_ssd
#linstor storage-pool create lvm proxmoxn2 pool_ssd vg_ssd
#linstor resource-group create adcgrp --storage-pool pool_ssd
--place-count 2
#linstor vg create adcgrp
On both nodes
#apt install linstor-proxmox
/etc/pve/storage.cfg
drbd: drbdstorage
content images, rootdir
controller 192.168.3.1,192.168.3.2
resourcegroup adcgrp
#systemctl restart pvedaemon
Making linstor HA
#linstor resource-definition create linstor_db
#linstor resource-definition set-property linstor_db
DrbdOptions/Resource/on-no-quorum io-error
#linstor volume-definition create linstor_db 200M
#linstor resource create linstor_db -s pool_ssd --auto-place 2
On both nodes
#systemctl disable --now linstor-controller
#cat << EOF > /etc/systemd/system/var-lib-linstor.mount
[Unit]
Description=Filesystem for the LINSTOR controller
[Mount]
# you can use the minor like /dev/drbdX or the udev symlink
What=/dev/drbd/by-res/linstor_db/0
Where=/var/lib/linstor
EOF
#mv /var/lib/linstor{,.orig}
#mkfs.ext4 /dev/drbd/by-res/linstor_db/0
#systemctl start var-lib-linstor.mount
#cp -r /var/lib/linstor.orig/* /var/lib/linstor
#systemctl start linstor-controller
#scp /etc/systemd/system/var-lib-linstor.mount
root at 192.168.1.246:/etc/systemd/system/var-lib-linstor.mount
#systemctl start linstor-controller
#apt install drbd-reactor
#mkdir /etc/drbd-reactor.d
/etc/drbd-reactor.d/linstor.toml
[promoter]]
promoter.resources.linstor_db]
start = ["var-lib-linstor.mount", "linstor-controller.service"]
#systemctl restart drbd-reactor
#systemctl enable drbd-reactor
#systemctl edit linstor-satellite
[Service]
Environment=LS_KEEP_RES=linstor_db
[Unit]
After=drbd-reactor.service
#systemctl restart linstor-satellite
I can create VM's and all seems to be ok.
After rebooting both nodes linstor/drbdadm shows this behaviour
now VM is very slow (10 times slower the in proxmox LVMTHIN)
dmesg show split-brain only for linstor_db
[ 17.632010] drbd linstor_db/0 drbd1001 proxmoxn2: helper command:
/sbin/drbdadm initial-split-brain
[ 17.632621] drbd linstor_db/0 drbd1001 proxmoxn2: helper command:
/sbin/drbdadm initial-split-brain exit code 0
[ 17.632627] drbd linstor_db/0 drbd1001: Split-Brain detected but
unresolved, dropping connection!
[ 17.632646] drbd linstor_db/0 drbd1001 proxmoxn2: helper command:
/sbin/drbdadm split-brain
[ 17.633208] drbd linstor_db/0 drbd1001 proxmoxn2: helper command:
/sbin/drbdadm split-brain exit code 0
Even manually fix split-brain doesn't work
Output of the nodes
First node
root at proxmoxn1:~# linstor r l
╭────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊
State ┊ CreatedOn ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db ┊ proxmoxn1 ┊ 7001 ┊ InUse ┊ StandAlone(proxmoxn2) ┊
UpToDate ┊ 2021-06-01 21:34:35 ┊
┊ linstor_db ┊ proxmoxn2 ┊ 7001 ┊ InUse ┊ Connecting(proxmoxn1) ┊
UpToDate ┊ 2021-06-01 21:34:35 ┊
┊ vm-100-disk-1 ┊ proxmoxn1 ┊ 7000 ┊ Unused ┊ Ok ┊
UpToDate ┊ 2021-05-29 12:30:08 ┊
┊ vm-100-disk-1 ┊ proxmoxn2 ┊ 7000 ┊ Unused ┊ Ok ┊
UpToDate ┊ 2021-05-29 12:30:07 ┊
┊ vm-108-disk-1 ┊ proxmoxn1 ┊ 7002 ┊ InUse ┊ StandAlone(proxmoxn2) ┊
UpToDate ┊ 2021-06-06 21:01:10 ┊
┊ vm-108-disk-1 ┊ proxmoxn2 ┊ 7002 ┊ Unused ┊ Connecting(proxmoxn1) ┊
UpToDate ┊ 2021-06-06 21:01:10 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
root at proxmoxn1:~# drbdadm status
linstor_db role:Primary
disk:UpToDate
proxmoxn2 connection:StandAlone
vm-100-disk-1 role:Secondary
disk:UpToDate
proxmoxn2 role:Secondary
peer-disk:UpToDate
vm-108-disk-1 role:Primary
disk:UpToDate
proxmoxn2 connection:StandAlone
root at proxmoxn1:~#
Second node
root at proxmoxn2:~# drbdadm status
linstor_db role:Primary
disk:UpToDate
proxmoxn1 connection:Connecting
vm-100-disk-1 role:Secondary
disk:UpToDate
proxmoxn1 role:Secondary
peer-disk:UpToDate
vm-108-disk-1 role:Secondary
disk:UpToDate
proxmoxn1 connection:Connecting
I´ve read the docs again an again but no luck
Can anybody help ?
Martin
More information about the drbd-user
mailing list