[DRBD-user] frequent wrong magic value with kernel >4.9

Lars Ellenberg lars.ellenberg at linbit.com
Tue Jan 9 16:24:20 CET 2018

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Jan 09, 2018 at 03:36:34PM +0100, Lars Ellenberg wrote:
> On Mon, Dec 25, 2017 at 03:19:42PM +0100, Andreas Pflug wrote:
> > Running two Debian 9.3 machines, directly connected via 10GBit on-board
> > 
> > X540 10GBit, with 15 drbd devices.
> > 
> > When running a 4.14.2 kernel (from sid) or a 4.13.13 kernel (from
> > stretch-backports), I see several "Wrong magic value 0x4c414245 in
> > protocol version 101" per day issued by the secondary, with subsequent
> > termination of the connection, reconnect and resync. The magic value
> > logged differs, quite often 0x00.
> > 
> > Using the current 4.9.65 kernel (or older) from stretch didn't show
> > these aborts in the past, and after going back they're gone again. It
> > seems to be some problem introduced after 4.9 kernels, since both 4.9
> > and 4.13 include drbd 8.4.7. Maybe some interference with the nic driver?
> > 
> > Kernel    drbd   ixgbe     errors
> > 4.9.65   8.4.7  4.4.0-k    no
> > 4.13.13  8.4.7  5.1.0-k    yes
> > 4.14.2   8.4.10 5.1.0-k    yes
> 
> "strange".
> 
> What does "lsblk -D" and "lsblk -t" say?
> 
> Do you have a scratch volume you can play with?
> As a datapoint, you try to "blkdiscard /dev/drbdX" it?
> dd if=/dev/zero of=/dev/drbdX bs=1G oflag=direct count=1?
> 
> Something like that?
> Any "easy" reproducer?

Maybe while preparing the pull requests for upstream,
we missed/mangled/broke something.

Can you also reproduce with "out-of-tree" drbd 8.4.10?

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list