<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><meta content="text/html;charset=UTF-8" http-equiv="Content-Type"></head><body ><div style="font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10pt;"><div><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div style="color: rgb(0,0,0);font-family: "Lucida Grande", Helvetica, Arial, sans-serif;font-size: 12.0px;font-style: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-indent: 0.0px;text-transform: none;white-space: normal;widows: 2;word-spacing: 0.0px;"><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div>Greetings everyone,<br></div><div><span class="x_285737916mceItemHidden">We are running a small 3 node<span> </span><span class="x_285737916mceItemHidden">drbd</span><span> </span>9.0.21 cluster, used for<span> </span><span class="x_285737916mceItemHidden">opennebula</span><span> </span><span class="x_285737916mceItemHidden">vm</span><span> </span>storage. Lately we have started recieving messages from one of our servers, about getting a "Wrong magic value 0x52464200 in protocol version 86". We are using <span class="x_285737916mceItemHidden">lacp</span><span> </span>bonded 2 gig links for traffic exchange between<span> </span><span class="x_285737916mceItemHidden">drbd</span><span> </span>hosts. Both<span> </span><span class="x_285737916mceItemHidden">lacp</span><span> </span>links are working fine and no outages<span> </span><span class="x_285737916mceItemHidden">were detected previously</span>. It's also interesting to point out that<span> </span><span class="x_285737916mceItemHidden">drbd9</span><span> </span>seems to "ignore" the problematic node and doesn't place any<span> </span><span class="x_285737916mceItemHidden">vm's</span><span> </span>on it.</span><br></div><div><span class="x_285737916mceItemHidden"></span><br></div><div><span class="x_285737916mceItemHidden">Here is a list of<span> </span><span class="x_285737916hiddenSpellError" style="border-bottom: 2.0px solid red;cursor: default;">kernel</span><span> </span>versions on our servers:</span><br></div><div>100 - 4.15.0-72-generic #81-Ubuntu SMP #controller node<br></div><div>101 - 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 #storage node<br></div><div>102 - 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 #storage node<br></div><div><br></div><div><span class="x_285737916mceItemHidden">We tried changing network rendering engines, checking<span> /</span><span class="x_285737916hiddenSpellError" style="border-bottom: 2.0px solid red;cursor: default;">tmp</span><span> </span>dir size, making non-<span class="x_285737916hiddenSpellError" style="border-bottom: 2.0px solid red;cursor: default;">lacp</span><span> </span>network between servers, and searching for network inconsistencies and even used different networking equipment but unfortunately, we haven't achieved any results.</span><br></div><div><span class="x_285737916mceItemHidden">The problem is highly reproducible in our environment ( even after fully reinstalling<span> </span><span class="x_285737916hiddenSpellError" style="border-bottom: 2.0px solid red;cursor: default;">drbd</span>, we still get magic value error ether on 101 or 102, while resources are avoiding deployment on 102).</span><br></div><div><br></div><div><br></div><div>We get following logs in kernel ring buffer on 102:<br></div><div>[Wed Feb 5 21:02:29 2020] drbd one-image-10: susp-io( no -> user) <br></div></div></div></div><div>[Wed Feb 5 21:02:32 2020] drbd one-image-10: susp-io( user -> no)<br></div></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0: Starting worker thread (from drbdsetup [20138])<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Starting sender thread (from drbdsetup [20143])<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting sender thread (from drbdsetup [20146])<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( StandAlone -> Unconnected )<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Starting receiver thread (from drbd_w_one-vm-1 [20139])<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( Unconnected -> Connecting )<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( StandAlone -> Unconnected )<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting receiver thread (from drbd_w_one-vm-1 [20139])<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Unconnected -> Connecting )<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Failed to initiate connection, err=-98<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( Connecting -> Disconnecting )<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Aborting remote state change 0 commit not possible<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Restarting sender thread<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Connection closed<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( Disconnecting -> StandAlone )<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Terminating receiver thread<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Failed to initiate connection, err=-98<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Connecting -> Disconnecting )<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Aborting remote state change 0 commit not possible<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Restarting sender thread<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Connection closed<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Disconnecting -> StandAlone )<br></div><div>[Wed Feb 5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating receiver thread<br></div><div>[Wed Feb 5 21:07:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Terminating sender thread<br></div><div>[Wed Feb 5 21:07:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating sender thread<br></div><div>[Wed Feb 5 21:07:35 2020] drbd one-vm-1385-disk-0/0 drbd1099: drbd_bm_resize called with capacity == 0<br></div><div><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div style="color: rgb(0,0,0);font-family: "Lucida Grande", Helvetica, Arial, sans-serif;font-size: 12.0px;font-style: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-indent: 0.0px;text-transform: none;white-space: normal;widows: 2;word-spacing: 0.0px;"><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div>[Wed Feb 5 21:07:35 2020] drbd one-vm-1385-disk-0: Terminating worker thread<br></div><div><br></div><div>Seems quite a similar case described here: <a href="https://lists.linbit.com/pipermail/drbd-user/2009-October/012777.html" target="_blank">https://lists.linbit.com/pipermail/drbd-user/2009-October/012777.html</a><br></div><div><br></div><div>At the same time on 100 (controller node):<br></div><div>[Wed Feb 5 21:02:29 2020] drbd one-image-10: susp-io( no -> user) <br></div></div></div></div></div><div>[Wed Feb 5 21:02:33 2020] drbd one-image-10: susp-io( user -> no)<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0: Starting worker thread (from drbdsetup [3965])<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting sender thread (from drbdsetup [3971])<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Starting sender thread (from drbdsetup [3975])<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: meta-data IO uses: blk-bio<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( Diskless -> Attaching )<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: Maximum number of peer devices = 7<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0: Method to ensure write ordering: flush<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: drbd_bm_resize called with capacity == 4201520<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: resync bitmap: bits=525190 words=57449 pages=113<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: size = 2052 MB (2100760 KB)<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: size = 2052 MB (2100760 KB)<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: recounting of set bits took additional 0ms<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( Attaching -> UpToDate )<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: attached to current UUID: C518769AA829C854<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( StandAlone -> Unconnected )<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting receiver thread (from drbd_w_one-vm-1 [3966])<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Unconnected -> Connecting )<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( StandAlone -> Unconnected )<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Starting receiver thread (from drbd_w_one-vm-1 [3966])<br></div><div>[Wed Feb 5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Unconnected -> Connecting )<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Handshake to peer 0 successful: Agreed network protocol version 116<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Peer authenticated using 20 bytes HMAC<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting ack_recv thread (from drbd_r_one-vm-1 [4004])<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Preparing remote state change 1265772256<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Committing remote state change 1265772256 (primary_nodes=0)<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Connecting -> Connected ) peer( Unknown -> Secondary )<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: drbd_sync_handshake:<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: self C518769AA829C854:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: peer C518769AA829C854:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: uuid_compare()=no-sync by rule 40<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: quorum( no -> yes )<br></div><div>[Wed Feb 5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )<br></div><div><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div style="color: rgb(0,0,0);font-family: "Lucida Grande", Helvetica, Arial, sans-serif;font-size: 12.0px;font-style: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-indent: 0.0px;text-transform: none;white-space: normal;widows: 2;word-spacing: 0.0px;"><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div>[Wed Feb 5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong magic value 0x52464220 in protocol version 86 <br></div></div></div></div></div><div>[Wed Feb 5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Connecting -> NetworkFailure )<br></div><div>[Wed Feb 5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Restarting sender thread<br></div><div>[Wed Feb 5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Connection closed<br></div><div>[Wed Feb 5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( NetworkFailure -> Unconnected )<br></div><div>[Wed Feb 5 21:02:49 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Unconnected -> Connecting )<br></div><div>[Wed Feb 5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong magic value 0x52464220 in protocol version 86<br></div><div>[Wed Feb 5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Connecting -> NetworkFailure )<br></div><div>[Wed Feb 5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Restarting sender thread<br></div><div>[Wed Feb 5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Connection closed<br></div><div>[Wed Feb 5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( NetworkFailure -> Unconnected )<br></div><div>[Wed Feb 5 21:03:02 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Unconnected -> Connecting )<br></div><div><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div style="color: rgb(0,0,0);font-family: "Lucida Grande", Helvetica, Arial, sans-serif;font-size: 12.0px;font-style: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-indent: 0.0px;text-transform: none;white-space: normal;widows: 2;word-spacing: 0.0px;"><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div>[Wed Feb 5 21:03:11 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong magic value 0x52464220 in protocol version 86<br></div><div><last 6 entries repeat for a while></div><div>[Wed Feb 5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Restarting sender thread <br></div></div></div></div></div><div>[Wed Feb 5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Connection closed<br></div><div>[Wed Feb 5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( Disconnecting -> StandAlone )<br></div><div>[Wed Feb 5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Terminating receiver thread<br></div><div>[Wed Feb 5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Terminating sender thread<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0: Preparing cluster-wide state change 1186333411 (1->0 496/16)<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0: State change 1186333411: primary_nodes=0, weak_nodes=0<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Cluster is now split<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0: Committing cluster-wide state change 1186333411 (0ms)<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown )<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: quorum( yes -> no )<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: ack_receiver terminated<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating ack_recv thread<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Aborting remote state change 0 commit not possible<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Restarting sender thread<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Connection closed<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( Disconnecting -> StandAlone )<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating receiver thread<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating sender thread<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( UpToDate -> Detaching )<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( Detaching -> Diskless )<br></div><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: drbd_bm_resize called with capacity == 0<br></div><div><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div style="color: rgb(0,0,0);font-family: "Lucida Grande", Helvetica, Arial, sans-serif;font-size: 12.0px;font-style: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-indent: 0.0px;text-transform: none;white-space: normal;widows: 2;word-spacing: 0.0px;"><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div>[Wed Feb 5 21:07:37 2020] drbd one-vm-1385-disk-0: Terminating worker thread<br></div><div><br></div><div>drbdadm status on 100:<br></div><div>one-vm-1385-disk-0 role:Secondary <br></div></div></div></div></div><div> disk:UpToDate<br></div><div> nts-cloud-n101 role:Secondary<br></div><div> peer-disk:UpToDate<br></div><div><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div style="color: rgb(0,0,0);font-family: "Lucida Grande", Helvetica, Arial, sans-serif;font-size: 12.0px;font-style: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-indent: 0.0px;text-transform: none;white-space: normal;widows: 2;word-spacing: 0.0px;"><div style="font-family: Verdana, Arial, Helvetica, sans-serif;font-size: 10.0pt;"><div> nts-cloud-n102 connection:Connecting<br></div><div><br></div><div><br></div><div>Best of all,<br></div><div><span class="x_285737916mceItemHidden"><span class="x_285737916hiddenSpellError" style="border-bottom: 2.0px solid red;cursor: default;">Gleb</span>.</span><br></div></div><div><br></div></div><div><br></div></div></div><div><br></div></div><br></body></html>