[DRBD-user] 2 node clustersplit-brain on linstor_db

Martin mlc42 at gmx.de
Tue Jun 8 22:44:57 CEST 2021


I´m trying to build a 2 node cluster with an extra qdevice to have 3 
votes with proxmox and drbd.

node 1 1GB NIC 192.168.1.245   2.5GB NIC 192.168.3.1
node 1 1GB NIC 192.168.1.246   2.5GB NIC 192.168.3.2


After installing proxmox 6.4 i install drbd9/linstor.


#apt install linstor-controller linstor-satellite linstor-client
#systemctl start linstor-satellite
#systemctl enable linstor-satellite

#systemctl start linstor-controller
#systemctl enable linstor-satellite

#linstor node create proxmoxn1 192.168.3.1 --node-type Combined
#linstor node create proxmoxn2 192.168.3.2 --node-type Combined

/etc/linstor/linstor-client.conf
     [global]
     controllers=proxmoxn1,proxmoxn2

#create a partition with fdisk /dev/nvme0n1
#vgcreate vg_ssd /dev/nvme0n1p4

On First node
#linstor storage-pool create lvm proxmoxn1 pool_ssd vg_ssd
#linstor storage-pool create lvm proxmoxn2 pool_ssd vg_ssd

#linstor resource-group create adcgrp --storage-pool pool_ssd 
--place-count 2
#linstor vg create adcgrp

On both nodes
#apt install linstor-proxmox

/etc/pve/storage.cfg
drbd: drbdstorage
     content images, rootdir
      controller 192.168.3.1,192.168.3.2
      resourcegroup adcgrp

#systemctl restart pvedaemon

Making linstor HA
#linstor resource-definition create linstor_db
#linstor resource-definition set-property linstor_db 
DrbdOptions/Resource/on-no-quorum io-error
#linstor volume-definition create linstor_db 200M
#linstor resource create linstor_db -s pool_ssd --auto-place 2

On both nodes
#systemctl disable --now linstor-controller

#cat << EOF > /etc/systemd/system/var-lib-linstor.mount
     [Unit]
     Description=Filesystem for the LINSTOR controller

     [Mount]
     # you can use the minor like /dev/drbdX or the udev symlink
     What=/dev/drbd/by-res/linstor_db/0
     Where=/var/lib/linstor
     EOF

#mv /var/lib/linstor{,.orig}
#mkfs.ext4 /dev/drbd/by-res/linstor_db/0
#systemctl start var-lib-linstor.mount

#cp -r /var/lib/linstor.orig/* /var/lib/linstor
#systemctl start linstor-controller
#scp /etc/systemd/system/var-lib-linstor.mount 
root at 192.168.1.246:/etc/systemd/system/var-lib-linstor.mount

#systemctl start linstor-controller

#apt install  drbd-reactor
#mkdir /etc/drbd-reactor.d
/etc/drbd-reactor.d/linstor.toml
     [promoter]]
     promoter.resources.linstor_db]
     start = ["var-lib-linstor.mount", "linstor-controller.service"]

#systemctl restart drbd-reactor
#systemctl enable drbd-reactor

#systemctl edit linstor-satellite
     [Service]
     Environment=LS_KEEP_RES=linstor_db
     [Unit]
     After=drbd-reactor.service


#systemctl restart linstor-satellite

I can create VM's and all seems to be ok.

After rebooting both nodes   linstor/drbdadm shows this behaviour
now VM is very slow (10 times slower the in proxmox LVMTHIN)

dmesg show split-brain only for linstor_db
[   17.632010] drbd linstor_db/0 drbd1001 proxmoxn2: helper command: 
/sbin/drbdadm initial-split-brain
[   17.632621] drbd linstor_db/0 drbd1001 proxmoxn2: helper command: 
/sbin/drbdadm initial-split-brain exit code 0
[   17.632627] drbd linstor_db/0 drbd1001: Split-Brain detected but 
unresolved, dropping connection!
[   17.632646] drbd linstor_db/0 drbd1001 proxmoxn2: helper command: 
/sbin/drbdadm split-brain
[   17.633208] drbd linstor_db/0 drbd1001 proxmoxn2: helper command: 
/sbin/drbdadm split-brain exit code 0

Even manually fix split-brain doesn't work






Output of the nodes



First node



root at proxmoxn1:~# linstor r l
╭────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName  ┊ Node      ┊ Port ┊ Usage  ┊ Conns                 ┊    
State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db    ┊ proxmoxn1 ┊ 7001 ┊ InUse  ┊ StandAlone(proxmoxn2) ┊ 
UpToDate ┊ 2021-06-01 21:34:35 ┊
┊ linstor_db    ┊ proxmoxn2 ┊ 7001 ┊ InUse  ┊ Connecting(proxmoxn1) ┊ 
UpToDate ┊ 2021-06-01 21:34:35 ┊
┊ vm-100-disk-1 ┊ proxmoxn1 ┊ 7000 ┊ Unused ┊ Ok                    ┊ 
UpToDate ┊ 2021-05-29 12:30:08 ┊
┊ vm-100-disk-1 ┊ proxmoxn2 ┊ 7000 ┊ Unused ┊ Ok                    ┊ 
UpToDate ┊ 2021-05-29 12:30:07 ┊
┊ vm-108-disk-1 ┊ proxmoxn1 ┊ 7002 ┊ InUse  ┊ StandAlone(proxmoxn2) ┊ 
UpToDate ┊ 2021-06-06 21:01:10 ┊
┊ vm-108-disk-1 ┊ proxmoxn2 ┊ 7002 ┊ Unused ┊ Connecting(proxmoxn1) ┊ 
UpToDate ┊ 2021-06-06 21:01:10 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
root at proxmoxn1:~# drbdadm status
linstor_db role:Primary
   disk:UpToDate
   proxmoxn2 connection:StandAlone

vm-100-disk-1 role:Secondary
   disk:UpToDate
   proxmoxn2 role:Secondary
     peer-disk:UpToDate

vm-108-disk-1 role:Primary
   disk:UpToDate
   proxmoxn2 connection:StandAlone

root at proxmoxn1:~#




Second node

root at proxmoxn2:~# drbdadm status
linstor_db role:Primary
   disk:UpToDate
   proxmoxn1 connection:Connecting

vm-100-disk-1 role:Secondary
   disk:UpToDate
   proxmoxn1 role:Secondary
     peer-disk:UpToDate

vm-108-disk-1 role:Secondary
   disk:UpToDate
   proxmoxn1 connection:Connecting



I´ve read the docs again an again but no luck
Can anybody help ?

Martin



More information about the drbd-user mailing list