<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="IT" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi all,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I have a 3-node DRBD Cluster that has suffered a Splitbrain. I recovered all resources except 1.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">For this resource, connections Node3-Node1 and Node3-Node2 are fine, but the connection Node1-Node2 is not working, as both sides see the other one as Standalone.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 3<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">[root@pbzne4demo-n3 ~]# drbdadm status influxdb<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">influxdb role:Primary<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> disk:UpToDate<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> pbzne4demo-n1.wp.lan role:Secondary<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> peer-disk:UpToDate<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> </span>pbzne4demo-n2.wp.lan role:Secondary<o:p></o:p></p>
<p class="MsoNormal"> <span lang="EN-US">peer-disk:UpToDate<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 2<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">[root@pbzne4demo-n2 ~]# drbdadm status influxdb<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">influxdb role:Secondary<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> disk:UpToDate<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> pbzne4demo-n1.wp.lan connection:StandAlone<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> pbzne4demo-n3.wp.lan role:Primary<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> peer-disk:UpToDate<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node1<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">[root@pbzne4demo-n1 ~]# drbdadm status influxdb<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">influxdb role:Secondary<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> disk:UpToDate<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> pbzne4demo-n2.wp.lan connection:StandAlone<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> pbzne4demo-n3.wp.lan role:Primary<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> peer-disk:UpToDate<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I tried disconnecting and reconnecting the resource on every node, but the standalone always remain on both the same nodes.<o:p></o:p></span></p>
<p class="MsoNormal">What I tried:<o:p></o:p></p>
<p class="MsoNormal"><span lang="EN-US">1. Disconnect from all nodes, connect on the primary node, connect --discard-my-data on both secondary nodes.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Standalone remains.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">/var/log/messages reports this on secondary nodes:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 2<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Handshake to peer 1 successful: Agreed network protocol version 114<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Starting ack_recv thread (from drbd_r_influxdb [7948])<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: incompatible discard-my-data settings<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( Connecting -> Disconnecting )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: error receiving P_PROTOCOL, e: -5 l: 1!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: ack_receiver terminated<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Terminating ack_recv thread<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Connection closed<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( Disconnecting -> StandAlone )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Terminating receiver thread<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:10 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Preparing remote state change 271906619<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:10 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Committing remote state change 271906619 (primary_nodes=8)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 1<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( StandAlone -> Unconnected )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Starting receiver thread (from drbd_w_influxdb [6596])<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Unconnected -> Connecting )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: conn( StandAlone -> Unconnected )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Starting receiver thread (from drbd_w_influxdb [6596])<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: conn( Unconnected -> Connecting )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Handshake to peer 2 successful: Agreed network protocol version 114<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Starting ack_recv thread (from drbd_r_influxdb [30208])<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: incompatible discard-my-data settings<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Connecting -> Disconnecting )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: error receiving P_PROTOCOL, e: -5 l: 1!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Handshake to peer 3 successful: Agreed network protocol version 114<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: ack_receiver terminated<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Terminating ack_recv thread<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Starting ack_recv thread (from drbd_r_influxdb [30210])<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Connection closed<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Disconnecting -> StandAlone )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Terminating receiver thread<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">2. Tried using drbdadm adjust on both the secondary nodes<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Standalone remains.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">/var/log/messages reports this on secondary nodes:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 2<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:01 pbzne4demo-n2 systemd: Started Session 3741 of user root.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:03 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( StandAlone -> Unconnected )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:03 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Starting receiver thread (from drbd_w_influxdb [6563])<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:03 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( Unconnected -> Connecting )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Handshake to peer 1 successful: Agreed network protocol version 114<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Starting ack_recv thread (from drbd_r_influxdb [8026])<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: incompatible discard-my-data settings<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( Connecting -> Disconnecting )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: error receiving P_PROTOCOL, e: -5 l: 1!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: ack_receiver terminated<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Terminating ack_recv thread<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Connection closed<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( Disconnecting -> StandAlone )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Terminating receiver thread<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 1<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:01 pbzne4demo-n1 systemd: Started Session 3754 of user root.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:15 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( StandAlone -> Unconnected )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:15 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Starting receiver thread (from drbd_w_influxdb [6596])<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:15 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Unconnected -> Connecting )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Handshake to peer 2 successful: Agreed network protocol version 114<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Starting ack_recv thread (from drbd_r_influxdb [30273])<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: incompatible discard-my-data settings<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Connecting -> Disconnecting )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: error receiving P_PROTOCOL, e: -5 l: 1!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: ack_receiver terminated<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Terminating ack_recv thread<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Connection closed<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Disconnecting -> StandAlone )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Terminating receiver thread<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">3. Disconnect from all nodes, invalidate on both secondary nodes, connect primary node then connect on both secondary nodes<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Standalone remains.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I think next steps might be working with metadata, but since I am a novice, I’m asking for suggestion. Please, can you help me in resolving this issue?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">This is not a critical system, I can rebuild it, but I’d like to come up with a procedure and a better understanding of how to handle this kind of cases, because I’m sure I will encounter it again.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Best regards,<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt;background:white"><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#3D3D3D;mso-fareast-language:IT">Rocco Pezzani</span></b><span lang="EN-US" style="mso-fareast-language:IT"><o:p></o:p></span></p>
</div>
</body>
</html>