[DRBD-user] DRBD critical error: CPU locks on bonded network interface

Köster, Michael Michael.Koester at funkemedien.de
Mon Feb 4 12:08:08 CET 2019


Dear all,

I most probably found a bug that occurs when using DRBD over a bonded network interface. The error has occured rarely in 9.0.14-1 and occurs almost always at 9.0.16-1 (For 9.0.14-1 a reboot fixes this issue sometimes.)

Result of the bug: The DRBD-module locks the CPUs and the kernel starts to kill processes.

Error-Code:
 kernel:[   83.811642] NMI watchdog: Watchdog detected hard LOCKUP on cpu 34
(Sometimes it's also a soft lock)

Attached an image of a system start. The cluster is resyncing and while the server is still in the progress of this resync the error occurs.

Setting:
We have several proxmox clusters in different locations (with different network and server hardware) and this issue arises for each cluster despite different settings:
- Cluster members ( 2 or 4)
- Server (Dell R730 or Thomas Krenn servers)
- Network-Bond (bond-mode active-backup or 802.3ad) for DRBD over two 10G interfaces.
- (We replaced the network components (i.e., switches, cables, etc.) between the proxmox virtualization hosts as well)

I am using the newest version and patches of PVE.
Linux 4.15.18-10-pve #1 SMP PVE 4.15.18-32 (Sat, 19 Jan 2019 10:09:37 +0100) x86_64

The DRBD version:
cat /proc/drbd
version: 9.0.16-1 (api:2/proto:86-114)
GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root at esproxmox01a, 2019-01-28 10:00:32
Transports (api:16): tcp (9.0.16-1)

The drbd-settings are (error occurs even if values are unset):
drbdmanage peer-device-options --common --c-fill-target 10000;
drbdmanage peer-device-options --common --c-max-rate   700000;
drbdmanage peer-device-options --common --c-plan-ahead    7;
drbdmanage peer-device-options --common --c-min-rate     4000;
drbdmanage peer-device-options --common --resync-rate  500000;

No additional settings are set.
DRBD volumes and resources are managed with drbdmanage:
cat /etc/drbdmanaged.cfg
[LOCAL]
storage-plugin = drbdmanage.storage.lvm.Lvm
force=1

Network settings (in the attached file).

Is there anything I have to change or is drbd with bonds not supported?

Thanks in advance for your help.

Mit freundlichen Grüßen

Michael Köster
Gruppenleiter
______________________________________________________

Infrastruktur + Service
Datacenter Service
Server + Daten

FUNKE Corporate IT GmbH
Hintern Brüdern 23, 38100 Braunschweig

Telefon:  +49 531 3900 224
Fax:      +49 531 3900 148
E-Mail:   michael.koester at funkemedien.de


____________________________________________________
FUNKE Corporate IT GmbH
Jacob-Funke-Platz 1, 45127 Essen
Sitz Essen, Registergericht Essen HRB 23241
Geschäftsführer: Michael Kurowski, Andreas Schoo, Michael Wüller

-------------- next part --------------
A non-text attachment was scrubbed...
Name: iKVM_capture.jpg
Type: image/jpeg
Size: 665900 bytes
Desc: iKVM_capture.jpg
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190204/10ea403e/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: drbdctrl.res
Type: application/octet-stream
Size: 1008 bytes
Desc: drbdctrl.res
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190204/10ea403e/attachment-0001.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: interfaces.txt
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190204/10ea403e/attachment-0001.txt>


More information about the drbd-user mailing list