[DRBD-user] frequent wrong magic value with kernel >4.9

Andreas Pflug pgadmin at pse-consulting.de
Wed Jan 10 10:14:26 CET 2018

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Am 09.01.18 um 16:24 schrieb Lars Ellenberg:
> On Tue, Jan 09, 2018 at 03:36:34PM +0100, Lars Ellenberg wrote: >> On Mon, Dec 25, 2017 at 03:19:42PM +0100, Andreas Pflug wrote: >>>
Running two Debian 9.3 machines, directly connected via 10GBit >>>
on-board >>> >>> X540 10GBit, with 15 drbd devices. >>> >>> When running
a 4.14.2 kernel (from sid) or a 4.13.13 kernel >>> (from
stretch-backports), I see several "Wrong magic value >>> 0x4c414245 in
protocol version 101" per day issued by the >>> secondary, with
subsequent termination of the connection, >>> reconnect and resync. The
magic value logged differs, quite often >>> 0x00. >>> >>> Using the
current 4.9.65 kernel (or older) from stretch didn't >>> show these
aborts in the past, and after going back they're gone >>> again. It
seems to be some problem introduced after 4.9 kernels, >>> since both
4.9 and 4.13 include drbd 8.4.7. Maybe some >>> interference with the
nic driver? >>> >>> Kernel drbd ixgbe errors 4.9.65 8.4.7 4.4.0-k no >>>
4.13.13 8.4.7 5.1.0-k yes 4.14.2 8.4.10 5.1.0-k yes >> >> "strange". >>
>> What does "lsblk -D" and "lsblk -t" say? NAME                    
ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE  RA WSAME

sda                              0 262144 262144     512     512    1
cfq       128 128    0B
└─sda1                           0 262144 262144     512     512    1
cfq       128 128    0B
  ├─local-stresstest             0 262144 262144     512     512   
1           128 128    0B
  │ └─drbd16                     0 262144 262144     512     512   
1           128 128    0B

NAME                     DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO

sda                             0        0B       0B         0
└─sda1                          0        0B       0B         0
  ├─local-stresstest            0        0B       0B         0
  │ └─drbd16                    0        0B       0B         0

>> >> >> Do you have a scratch volume you can play with? As a datapoint,
you >> try to "blkdiscard /dev/drbdX" it?
blkdiscard: /dev/drbd16: BLKDISCARD ioctl failed: Operation not supported
It's hosted on LVM on a hardware raid6 disk.

>> >> dd if=/dev/zero of=/dev/drbdX bs=1G oflag=direct count=1? dd
if=/dev/zero of=/dev/drbd16 bs=1M count=3072 oflag=direct several times
gives ~300MB/s and no problem.

This was executed on the primary server with 4.9.65 and the secondary
4.14.7 (stretch-backports). Seems that zeroes don't trigger the problem.



> > Maybe while preparing the pull requests for upstream, we >
missed/mangled/broke something. > > Can you also reproduce with
"out-of-tree" drbd 8.4.10?
Since my post to drbd-user didn't make it to the list for two weeks, I
missed the week after christmas when everybody was on holidays, so the
system is back in full production and I'm uncomfortable with doing too
much testing.

Regards,
Andreas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180110/1e2c6453/attachment.htm>


More information about the drbd-user mailing list