[DRBD-user] Soft lockup CPU#2 stuck problems with DRBD 9 after rebooting other node

Thu Oct 20 00:54:39 CEST 2016

Hi,

I have done some more investigation but I am still having a lot of 
problems with CentOS 7, DRBD 9 and Xen. Is there anyone using the same 
combination without issues?

I did find a similar issue and suggestions in 
http://lists.linbit.com/pipermail/drbd-user/2015-April/021938.html, and 
tried disabling NIC offloading by using this:

ethtool -K eno1 tso off gso off

But as soon as I do a reboot or a network restart on one of the nodes, 
everything is broken again with the CPU stuck errors, and I have to 
reboot all servers.

Anyone having any suggestions on what to try next?

Kind regards,

Maarten Bremer

> We have problems with our three node DRBD 9 setup with CentOS 7 and Xen
> 4. When one of our nodes is rebooted, or becomes unavailable, the other
> nodes freeze entirely without any information, or give the following
> message:
>
> kernel:NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
> [drbd_w_db:1469]
>
> They then require a reboot, sometimes crashing one of the other nodes
> again in the process. Not a fun thing in a HA setup...
>
> Does anyone have an idea what is going on, and what we can do to prevent
> this from happening?
>
> We are running:
>
> DRBD 9.0.4-1 (api:2/proto:86-112)
> CentOS 3.18.41-20.el7.x86_64
> Xen 4.6.3-3.el7
>
> I do not know if it is related, but we are using bonding (mode 1,
> active-backup) with two network adapters.