[DRBD-user] DRBD critical error: CPU locks on bonded network interface

Köster, Michael Michael.Koester at funkemedien.de
Mon Feb 4 12:08:08 CET 2019

Dear all,

I most probably found a bug that occurs when using DRBD over a bonded network interface. The error has occured rarely in 9.0.14-1 and occurs almost always at 9.0.16-1 (For 9.0.14-1 a reboot fixes this issue sometimes.)

Result of the bug: The DRBD-module locks the CPUs and the kernel starts to kill processes.

 kernel:[   83.811642] NMI watchdog: Watchdog detected hard LOCKUP on cpu 34
(Sometimes it's also a soft lock)

Attached an image of a system start. The cluster is resyncing and while the server is still in the progress of this resync the error occurs.

We have several proxmox clusters in different locations (with different network and server hardware) and this issue arises for each cluster despite different settings:
- Cluster members ( 2 or 4)
- Server (Dell R730 or Thomas Krenn servers)
- Network-Bond (bond-mode active-backup or 802.3ad) for DRBD over two 10G interfaces.
- (We replaced the network components (i.e., switches, cables, etc.) between the proxmox virtualization hosts as well)

I am using the newest version and patches of PVE.
Linux 4.15.18-10-pve #1 SMP PVE 4.15.18-32 (Sat, 19 Jan 2019 10:09:37 +0100) x86_64

The DRBD version:
cat /proc/drbd
version: 9.0.16-1 (api:2/proto:86-114)
GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root at esproxmox01a, 2019-01-28 10:00:32
Transports (api:16): tcp (9.0.16-1)

The drbd-settings are (error occurs even if values are unset):
drbdmanage peer-device-options --common --c-fill-target 10000;
drbdmanage peer-device-options --common --c-max-rate   700000;
drbdmanage peer-device-options --common --c-plan-ahead    7;
drbdmanage peer-device-options --common --c-min-rate     4000;
drbdmanage peer-device-options --common --resync-rate  500000;

No additional settings are set.
DRBD volumes and resources are managed with drbdmanage:
cat /etc/drbdmanaged.cfg
storage-plugin = drbdmanage.storage.lvm.Lvm

Network settings (in the attached file).

Is there anything I have to change or is drbd with bonds not supported?

Thanks in advance for your help.

Mit freundlichen Grüßen

Michael Köster

Infrastruktur + Service
Datacenter Service
Server + Daten

FUNKE Corporate IT GmbH
Hintern Brüdern 23, 38100 Braunschweig

Telefon:  +49 531 3900 224
Fax:      +49 531 3900 148
E-Mail:   michael.koester at funkemedien.de

FUNKE Corporate IT GmbH
Jacob-Funke-Platz 1, 45127 Essen
Sitz Essen, Registergericht Essen HRB 23241
Geschäftsführer: Michael Kurowski, Andreas Schoo, Michael Wüller

-------------- next part --------------
A non-text attachment was scrubbed...
Name: iKVM_capture.jpg
Type: image/jpeg
Size: 665900 bytes
Desc: iKVM_capture.jpg
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190204/10ea403e/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: drbdctrl.res
Type: application/octet-stream
Size: 1008 bytes
Desc: drbdctrl.res
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190204/10ea403e/attachment-0001.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: interfaces.txt
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20190204/10ea403e/attachment-0001.txt>

More information about the drbd-user mailing list