[DRBD-user] Constant wrong magic value errors in kernel buffer

Notespeak Fyyar notespeak at ntshosts.com
Wed Feb 5 19:33:11 CET 2020


Greetings everyone,

We are running a small 3 node drbd 9.0.21 cluster, used for opennebula vm storage. Lately we have started recieving messages from one of our servers, about getting a "Wrong magic value 0x52464200 in protocol version 86". We are using lacp bonded 2 gig links for traffic exchange between drbd hosts. Both lacp links are working fine and no outages were detected previously. It's also interesting to point out that drbd9 seems to "ignore" the problematic node and doesn't place any vm's on it.

​

Here is a list of kernel versions on our servers:

100 - 4.15.0-72-generic #81-Ubuntu SMP         #controller node

101 - 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 #storage node

102 - 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 #storage  node



We tried changing network rendering engines, checking /tmp dir size, making non-lacp network between servers, and searching for network inconsistencies and even used different networking equipment  but unfortunately, we haven't achieved any results.

The problem is highly reproducible in our environment ( even after fully reinstalling drbd, we still get magic value error ether on 101 or 102, while resources are avoiding deployment  on 102).





We get following logs in kernel ring buffer on 102:

[Wed Feb  5 21:02:29 2020] drbd one-image-10: susp-io( no -> user) 




[Wed Feb  5 21:02:32 2020] drbd one-image-10: susp-io( user -> no)


[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0: Starting worker thread (from drbdsetup [20138])

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Starting sender thread (from drbdsetup [20143])

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting sender thread (from drbdsetup [20146])

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( StandAlone -> Unconnected )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Starting receiver thread (from drbd_w_one-vm-1 [20139])

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( Unconnected -> Connecting )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( StandAlone -> Unconnected )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting receiver thread (from drbd_w_one-vm-1 [20139])

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Unconnected -> Connecting )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Failed to initiate connection, err=-98

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( Connecting -> Disconnecting )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Aborting remote state change 0 commit not possible

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Restarting sender thread

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Connection closed

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( Disconnecting -> StandAlone )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Terminating receiver thread

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Failed to initiate connection, err=-98

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Connecting -> Disconnecting )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Aborting remote state change 0 commit not possible

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Restarting sender thread

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Connection closed

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Disconnecting -> StandAlone )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating receiver thread

[Wed Feb  5 21:07:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Terminating sender thread

[Wed Feb  5 21:07:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating sender thread

[Wed Feb  5 21:07:35 2020] drbd one-vm-1385-disk-0/0 drbd1099: drbd_bm_resize called with capacity == 0

[Wed Feb  5 21:07:35 2020] drbd one-vm-1385-disk-0: Terminating worker thread



Seems quite a similar case described here: https://lists.linbit.com/pipermail/drbd-user/2009-October/012777.html



At the same time on 100 (controller node):

[Wed Feb  5 21:02:29 2020] drbd one-image-10: susp-io( no -> user) 





[Wed Feb  5 21:02:33 2020] drbd one-image-10: susp-io( user -> no)

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0: Starting worker thread (from drbdsetup [3965])

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting sender thread (from drbdsetup [3971])

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Starting sender thread (from drbdsetup [3975])

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: meta-data IO uses: blk-bio

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( Diskless -> Attaching )

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: Maximum number of peer devices = 7

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0: Method to ensure write ordering: flush

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: drbd_bm_resize called with capacity == 4201520

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: resync bitmap: bits=525190 words=57449 pages=113

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: size = 2052 MB (2100760 KB)

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: size = 2052 MB (2100760 KB)

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: recounting of set bits took additional 0ms

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( Attaching -> UpToDate )

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: attached to current UUID: C518769AA829C854

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( StandAlone -> Unconnected )

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting receiver thread (from drbd_w_one-vm-1 [3966])

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Unconnected -> Connecting )

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( StandAlone -> Unconnected )

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Starting receiver thread (from drbd_w_one-vm-1 [3966])

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Unconnected -> Connecting )

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Handshake to peer 0 successful: Agreed network protocol version 116

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Peer authenticated using 20 bytes HMAC

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting ack_recv thread (from drbd_r_one-vm-1 [4004])

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Preparing remote state change 1265772256

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Committing remote state change 1265772256 (primary_nodes=0)

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Connecting -> Connected ) peer( Unknown -> Secondary )

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: drbd_sync_handshake:

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: self C518769AA829C854:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: peer C518769AA829C854:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: uuid_compare()=no-sync by rule 40

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: quorum( no -> yes )

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )

[Wed Feb  5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong magic value 0x52464220 in protocol version 86 





[Wed Feb  5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Connecting -> NetworkFailure )

[Wed Feb  5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Restarting sender thread

[Wed Feb  5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Connection closed

[Wed Feb  5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( NetworkFailure -> Unconnected )

[Wed Feb  5 21:02:49 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Unconnected -> Connecting )

[Wed Feb  5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong magic value 0x52464220 in protocol version 86

[Wed Feb  5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Connecting -> NetworkFailure )

[Wed Feb  5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Restarting sender thread

[Wed Feb  5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Connection closed

[Wed Feb  5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( NetworkFailure -> Unconnected )

[Wed Feb  5 21:03:02 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Unconnected -> Connecting )

[Wed Feb  5 21:03:11 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong magic value 0x52464220 in protocol version 86

<last 6 entries repeat for a while>
[Wed Feb  5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Restarting sender thread 





[Wed Feb  5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Connection closed

[Wed Feb  5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Disconnecting -> StandAlone )

[Wed Feb  5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Terminating receiver thread

[Wed Feb  5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Terminating sender thread

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0: Preparing cluster-wide state change 1186333411 (1->0 496/16)

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0: State change 1186333411: primary_nodes=0, weak_nodes=0

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Cluster is now split

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0: Committing cluster-wide state change 1186333411 (0ms)

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: quorum( yes -> no )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: ack_receiver terminated

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating ack_recv thread

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Aborting remote state change 0 commit not possible

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Restarting sender thread

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Connection closed

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Disconnecting -> StandAlone )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating receiver thread

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating sender thread

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( UpToDate -> Detaching )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( Detaching -> Diskless )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: drbd_bm_resize called with capacity == 0

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0: Terminating worker thread



drbdadm status on 100:

one-vm-1385-disk-0 role:Secondary 





  disk:UpToDate

  nts-cloud-n101 role:Secondary

    peer-disk:UpToDate

  nts-cloud-n102 connection:Connecting





Best of all,

Gleb.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20200205/d8763974/attachment-0001.htm>


More information about the drbd-user mailing list