[DRBD-user] Softlockup when using 9.0.15-1 version

Nicholas Morton kmorton at cancinc.com
Wed Apr 10 22:53:51 CEST 2019


I recently also ran into this issue with a couple nodes loses network
connectivity and showing various messages like these.
"NMI watchdog: Watchdog detected hard"
"watchdog: BUG: soft lockup - CPU#xx stuck for 22s!"

I don't know how to do all the stack trace stuff, but I was running these
versions.
root at vmhost1:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-12-pve)
pve-manager: 5.3-12 (running version: 5.3-12/5fbbbaf6)
pve-kernel-4.15: 5.3-3
pve-kernel-4.15.18-12-pve: 4.15.18-35
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-48
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-12
libpve-storage-perl: 5.0-39
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-24
pve-cluster: 5.0-34
pve-container: 2.0-35
pve-docs: 5.3-3
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-18
pve-firmware: 2.0-6
pve-ha-manager: 2.0-8
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-2
pve-xtermjs: 3.10.1-2
qemu-server: 5.0-47
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
and
drbd-dkms/unknown,now 9.0.17-1
drbd-utils/unknown,now 9.8.0-1
linstor-common/unknown,now 0.9.4-1
linstor-proxmox/unknown,now 3.0.3-1
linstor-satellite/unknown,now 0.9.4-1
python-linstor/unknown,now 0.9.1-1

One of my nodes would lockup shortly after it would start synchronizing and
did so repeatedly.
After I downgraded drbd-dkms to 9.0.14-1 my cluster is stable again.

Just an FYI,
Thanks



More information about the drbd-user mailing list