[DRBD-user] DRBD critical error: CPU locks on bonded network interface
Köster, Michael
Michael.Koester at funkemedien.de
Mon Feb 4 12:08:08 CET 2019
Dear all,
I most probably found a bug that occurs when using DRBD over a bonded network interface. The error has occured rarely in 9.0.14-1 and occurs almost always at 9.0.16-1 (For 9.0.14-1 a reboot fixes this issue sometimes.)
Result of the bug: The DRBD-module locks the CPUs and the kernel starts to kill processes.
Error-Code:
kernel:[ 83.811642] NMI watchdog: Watchdog detected hard LOCKUP on cpu 34
(Sometimes it's also a soft lock)
Attached an image of a system start. The cluster is resyncing and while the server is still in the progress of this resync the error occurs.
Setting:
We have several proxmox clusters in different locations (with different network and server hardware) and this issue arises for each cluster despite different settings:
- Cluster members ( 2 or 4)
- Server (Dell R730 or Thomas Krenn servers)
- Network-Bond (bond-mode active-backup or 802.3ad) for DRBD over two 10G interfaces.
- (We replaced the network components (i.e., switches, cables, etc.) between the proxmox virtualization hosts as well)
I am using the newest version and patches of PVE.
Linux 4.15.18-10-pve #1 SMP PVE 4.15.18-32 (Sat, 19 Jan 2019 10:09:37 +0100) x86_64
The DRBD version:
cat /proc/drbd
version: 9.0.16-1 (api:2/proto:86-114)
GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root at esproxmox01a, 2019-01-28 10:00:32
Transports (api:16): tcp (9.0.16-1)
The drbd-settings are (error occurs even if values are unset):
drbdmanage peer-device-options --common --c-fill-target 10000;
drbdmanage peer-device-options --common --c-max-rate 700000;
drbdmanage peer-device-options --common --c-plan-ahead 7;
drbdmanage peer-device-options --common --c-min-rate 4000;
drbdmanage peer-device-options --common --resync-rate 500000;
No additional settings are set.
DRBD volumes and resources are managed with drbdmanage:
cat /etc/drbdmanaged.cfg
[LOCAL]
storage-plugin = drbdmanage.storage.lvm.Lvm
force=1
Network settings (in the attached file).
Is there anything I have to change or is drbd with bonds not supported?
Thanks in advance for your help.
Mit freundlichen Grüßen
Michael Köster
Gruppenleiter
______________________________________________________
Infrastruktur + Service
Datacenter Service
Server + Daten
FUNKE Corporate IT GmbH
Hintern Brüdern 23, 38100 Braunschweig
Telefon: +49 531 3900 224
Fax: +49 531 3900 148
E-Mail: michael.koester at funkemedien.de
____________________________________________________
FUNKE Corporate IT GmbH
Jacob-Funke-Platz 1, 45127 Essen
Sitz Essen, Registergericht Essen HRB 23241
Geschäftsführer: Michael Kurowski, Andreas Schoo, Michael Wüller
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iKVM_capture.jpg
Type: image/jpeg
Size: 665900 bytes
Desc: iKVM_capture.jpg
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190204/10ea403e/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: drbdctrl.res
Type: application/octet-stream
Size: 1008 bytes
Desc: drbdctrl.res
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190204/10ea403e/attachment-0001.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: interfaces.txt
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190204/10ea403e/attachment-0001.txt>
More information about the drbd-user
mailing list