[DRBD-user] Softlockup when using 9.0.15-1 version

Thu Apr 11 11:35:53 CEST 2019

Dear Nicholas,

the issue could be solved - in my case - by updating to the latest version, ie. 9.0.17-1.

Even the bonding seems to work now. Be careful while updating: If you reboot your system while you are still using both versions the error might come up again. So I recommend to keep the downtime between the nodes as short as possible and the change rate of the drbd-devices low.

Best regards,
Mit freundlichen Grüßen

Michael Köster
Gruppenleiter
______________________________________________________

Infrastruktur + Service
Datacenter Service
Server + Daten

FUNKE Corporate IT GmbH
Hintern Brüdern 23, 38100 Braunschweig

Telefon:  +49 531 3900 224
Fax:      +49 531 3900 148
E-Mail:   michael.koester at funkemedien.de

____________________________________________________
FUNKE Corporate IT GmbH
Jacob-Funke-Platz 1, 45127 Essen
Sitz Essen, Registergericht Essen HRB 23241
Geschäftsführer: Michael Kurowski, Andreas Schoo, Michael Wüller

-----Ursprüngliche Nachricht-----
Von: drbd-user-bounces at lists.linbit.com <drbd-user-bounces at lists.linbit.com> Im Auftrag von Nicholas Morton
Gesendet: Mittwoch, 10. April 2019 22:54
An: drbd-user at lists.linbit.com
Betreff: Re: [DRBD-user] Softlockup when using 9.0.15-1 version

I recently also ran into this issue with a couple nodes loses network connectivity and showing various messages like these.
"NMI watchdog: Watchdog detected hard"
"watchdog: BUG: soft lockup - CPU#xx stuck for 22s!"

I don't know how to do all the stack trace stuff, but I was running these versions.
root at vmhost1:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-12-pve)
pve-manager: 5.3-12 (running version: 5.3-12/5fbbbaf6)
pve-kernel-4.15: 5.3-3
pve-kernel-4.15.18-12-pve: 4.15.18-35
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-48
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-12
libpve-storage-perl: 5.0-39
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-24
pve-cluster: 5.0-34
pve-container: 2.0-35
pve-docs: 5.3-3
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-18
pve-firmware: 2.0-6
pve-ha-manager: 2.0-8
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-2
pve-xtermjs: 3.10.1-2
qemu-server: 5.0-47
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
and
drbd-dkms/unknown,now 9.0.17-1
drbd-utils/unknown,now 9.8.0-1
linstor-common/unknown,now 0.9.4-1
linstor-proxmox/unknown,now 3.0.3-1
linstor-satellite/unknown,now 0.9.4-1
python-linstor/unknown,now 0.9.1-1

One of my nodes would lockup shortly after it would start synchronizing and did so repeatedly.
After I downgraded drbd-dkms to 9.0.14-1 my cluster is stable again.

Just an FYI,
Thanks

_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user