[DRBD-user] Constant wrong magic value errors in kernel buffer
Глеб Смирнов
northerntech.am at gmail.com
Wed Feb 5 19:36:19 CET 2020
Greetings everyone,
We are running a small 3 node drbd 9.0.21 cluster, used for opennebula
vm storage.
Lately we have started recieving messages from one of our servers, about
getting a "Wrong magic value 0x52464200 in protocol version 86". We are
using lacp bonded 2 gig links for traffic exchange between drbd hosts. Both
lacp links are working fine and no outages were detected previously. It's
also interesting to point out that drbd9 seems to "ignore" the problematic
node and doesn't place any vm's on it.
Here is a list of kernel versions on our servers:
100 - 4.15.0-72-generic #81-Ubuntu SMP #controller node
101 - 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 #storage node
102 - 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 #storage node
We tried changing network rendering engines, checking /tmp dir size, making
non-lacp network between servers, and searching for network inconsistencies
and even used different networking equipment but unfortunately, we haven't
achieved any results.
The problem is highly reproducible in our environment ( even after fully
reinstalling drbd, we still get magic value error ether on 101 or 102,
while resources are avoiding deployment on 102).
We get following logs in kernel ring buffer on 102:
[Wed Feb 5 21:02:29 2020] drbd one-image-10: susp-io( no -> user)
[Wed Feb 5 21:02:32 2020] drbd one-image-10: susp-io( user -> no)
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0: Starting worker thread
(from drbdsetup [20138])
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Starting
sender thread (from drbdsetup [20143])
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting
sender thread (from drbdsetup [20146])
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn(
StandAlone -> Unconnected )
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Starting
receiver thread (from drbd_w_one-vm-1 [20139])
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn(
Unconnected -> Connecting )
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn(
StandAlone -> Unconnected )
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting
receiver thread (from drbd_w_one-vm-1 [20139])
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn(
Unconnected -> Connecting )
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Failed
to initiate connection, err=-98
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn(
Connecting -> Disconnecting )
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Aborting
remote state change 0 commit not possible
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100:
Restarting sender thread
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100:
Connection closed
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn(
Disconnecting -> StandAlone )
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100:
Terminating receiver thread
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Failed
to initiate connection, err=-98
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn(
Connecting -> Disconnecting )
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Aborting
remote state change 0 commit not possible
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Restarting sender thread
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Connection closed
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn(
Disconnecting -> StandAlone )
[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Terminating receiver thread
[Wed Feb 5 21:07:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100:
Terminating sender thread
[Wed Feb 5 21:07:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Terminating sender thread
[Wed Feb 5 21:07:35 2020] drbd one-vm-1385-disk-0/0 drbd1099:
drbd_bm_resize called with capacity == 0
[Wed Feb 5 21:07:35 2020] drbd one-vm-1385-disk-0: Terminating worker
thread
Seems quite a similar case described here:
https://lists.linbit.com/pipermail/drbd-user/2009-October/012777.html
At the same time on 100 (controller node):
[Wed Feb 5 21:02:29 2020] drbd one-image-10: susp-io( no -> user)
[Wed Feb 5 21:02:33 2020] drbd one-image-10: susp-io( user -> no)
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0: Starting worker thread
(from drbdsetup [3965])
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting
sender thread (from drbdsetup [3971])
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Starting
sender thread (from drbdsetup [3975])
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: meta-data IO
uses: blk-bio
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk(
Diskless -> Attaching )
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: Maximum
number of peer devices = 7
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0: Method to ensure write
ordering: flush
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099:
drbd_bm_resize called with capacity == 4201520
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: resync
bitmap: bits=525190 words=57449 pages=113
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: size = 2052
MB (2100760 KB)
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: size = 2052
MB (2100760 KB)
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: recounting
of set bits took additional 0ms
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk(
Attaching -> UpToDate )
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: attached to
current UUID: C518769AA829C854
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn(
StandAlone -> Unconnected )
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting
receiver thread (from drbd_w_one-vm-1 [3966])
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn(
Unconnected -> Connecting )
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn(
StandAlone -> Unconnected )
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Starting
receiver thread (from drbd_w_one-vm-1 [3966])
[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn(
Unconnected -> Connecting )
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Handshake to peer 0 successful: Agreed network protocol version 116
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Feature
flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
WRITE_ZEROES.
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Peer
authenticated using 20 bytes HMAC
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting
ack_recv thread (from drbd_r_one-vm-1 [4004])
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Preparing remote state change 1265772256
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Committing remote state change 1265772256 (primary_nodes=0)
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn(
Connecting -> Connected ) peer( Unknown -> Secondary )
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099
nts-cloud-n101: drbd_sync_handshake:
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099
nts-cloud-n101: self
C518769AA829C854:0000000000000000:0000000000000000:0000000000000000 bits:0
flags:20
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099
nts-cloud-n101: peer
C518769AA829C854:0000000000000000:0000000000000000:0000000000000000 bits:0
flags:20
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099
nts-cloud-n101: uuid_compare()=no-sync by rule 40
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: quorum( no
-> yes )
[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099
nts-cloud-n101: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
[Wed Feb 5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong
magic value 0x52464220 in protocol version 86
[Wed Feb 5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn(
Connecting -> NetworkFailure )
[Wed Feb 5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102:
Restarting sender thread
[Wed Feb 5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102:
Connection closed
[Wed Feb 5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn(
NetworkFailure -> Unconnected )
[Wed Feb 5 21:02:49 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn(
Unconnected -> Connecting )
[Wed Feb 5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong
magic value 0x52464220 in protocol version 86
[Wed Feb 5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn(
Connecting -> NetworkFailure )
[Wed Feb 5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102:
Restarting sender thread
[Wed Feb 5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102:
Connection closed
[Wed Feb 5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn(
NetworkFailure -> Unconnected )
[Wed Feb 5 21:03:02 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn(
Unconnected -> Connecting )
[Wed Feb 5 21:03:11 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong
magic value 0x52464220 in protocol version 86
<last 6 entries repeat for a while>
[Wed Feb 5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102:
Restarting sender thread
[Wed Feb 5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102:
Connection closed
[Wed Feb 5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn(
Disconnecting -> StandAlone )
[Wed Feb 5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102:
Terminating receiver thread
[Wed Feb 5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102:
Terminating sender thread
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0: Preparing cluster-wide
state change 1186333411 (1->0 496/16)
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0: State change
1186333411: primary_nodes=0, weak_nodes=0
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Cluster
is now split
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0: Committing cluster-wide
state change 1186333411 (0ms)
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn(
Connected -> Disconnecting ) peer( Secondary -> Unknown )
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: quorum( yes
-> no )
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099
nts-cloud-n101: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
ack_receiver terminated
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Terminating ack_recv thread
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Aborting
remote state change 0 commit not possible
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Restarting sender thread
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Connection closed
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn(
Disconnecting -> StandAlone )
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Terminating receiver thread
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101:
Terminating sender thread
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk(
UpToDate -> Detaching )
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk(
Detaching -> Diskless )
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099:
drbd_bm_resize called with capacity == 0
[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0: Terminating worker
thread
drbdadm status on 100:
one-vm-1385-disk-0 role:Secondary
disk:UpToDate
nts-cloud-n101 role:Secondary
peer-disk:UpToDate
nts-cloud-n102 connection:Connecting
Best of all,
Gleb.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20200205/9eac906e/attachment-0001.htm>
More information about the drbd-user
mailing list