<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Am 09.01.18 um 16:24 schrieb Lars Ellenberg:<br>
<span style="white-space: pre-wrap; display: block; width: 98vw;">> On Tue, Jan 09, 2018 at 03:36:34PM +0100, Lars Ellenberg wrote:
>> On Mon, Dec 25, 2017 at 03:19:42PM +0100, Andreas Pflug wrote:
>>> Running two Debian 9.3 machines, directly connected via 10GBit
>>> on-board
>>>
>>> X540 10GBit, with 15 drbd devices.
>>>
>>> When running a 4.14.2 kernel (from sid) or a 4.13.13 kernel
>>> (from stretch-backports), I see several "Wrong magic value
>>> 0x4c414245 in protocol version 101" per day issued by the
>>> secondary, with subsequent termination of the connection,
>>> reconnect and resync. The magic value logged differs, quite often
>>> 0x00.
>>>
>>> Using the current 4.9.65 kernel (or older) from stretch didn't
>>> show these aborts in the past, and after going back they're gone
>>> again. It seems to be some problem introduced after 4.9 kernels,
>>> since both 4.9 and 4.13 include drbd 8.4.7. Maybe some
>>> interference with the nic driver?
>>>
>>> Kernel drbd ixgbe errors 4.9.65 8.4.7 4.4.0-k no
>>> 4.13.13 8.4.7 5.1.0-k yes 4.14.2 8.4.10 5.1.0-k yes
>>
>> "strange".
>>
>> What does "lsblk -D" and "lsblk -t" say?
</span>NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC
ROTA SCHED RQ-SIZE RA WSAME<br>
<br>
sda 0 262144 262144 512 512
1 cfq 128 128 0B<br>
└─sda1 0 262144 262144 512 512
1 cfq 128 128 0B<br>
├─local-stresstest 0 262144 262144 512 512
1 128 128 0B<br>
│ └─drbd16 0 262144 262144 512 512
1 128 128 0B<br>
<br>
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO<br>
<br>
sda 0 0B 0B 0<br>
└─sda1 0 0B 0B 0<br>
├─local-stresstest 0 0B 0B 0<br>
│ └─drbd16 0 0B 0B 0<br>
<br>
<span style="white-space: pre-wrap; display: block; width: 98vw;">>>
>>
>> Do you have a scratch volume you can play with? As a datapoint, you
>> try to "blkdiscard /dev/drbdX" it?
</span><br>
blkdiscard: /dev/drbd16: BLKDISCARD ioctl failed: Operation not
supported<br>
It's hosted on LVM on a hardware raid6 disk.<br>
<br>
<span style="white-space: pre-wrap; display: block; width: 98vw;">>>
>> dd if=/dev/zero of=/dev/drbdX bs=1G oflag=direct count=1?
</span>dd if=/dev/zero of=/dev/drbd16 bs=1M count=3072 oflag=direct
several times gives ~300MB/s and no problem.<br>
<br>
This was executed on the primary server with 4.9.65 and the
secondary 4.14.7 (stretch-backports). Seems that zeroes don't
trigger the problem.<br>
<br>
<br>
<br>
<span style="white-space: pre-wrap; display: block; width: 98vw;">>
> Maybe while preparing the pull requests for upstream, we
> missed/mangled/broke something.
>
> Can you also reproduce with "out-of-tree" drbd 8.4.10?
</span><br>
Since my post to drbd-user didn't make it to the list for two weeks,
I missed the week after christmas when everybody was on holidays, so
the system is back in full production and I'm uncomfortable with
doing too much testing.<br>
<br>
Regards,<br>
Andreas<br>
<br>
<br>
<br>
</body>
</html>