<div><div dir="auto">I would try disconnecting or bringing down the resource either on Node1 or Node2. Then write some data on the Primary and finally bring up or connect the resource. This should trigger a sync for the newly created data on this resource/node. </div></div><div dir="auto">Last option would be to either invalidate the data of the affected resource on either Node1 or Node2 ,or re-create its metadata, but that will trigger a full sync, which may not be desirable.</div><div dir="auto">Once you manage to sort this out, consider implementing the quorum feature in order to avoid split-brain situations in the future.</div><div dir="auto"><br></div><div dir="auto">Gianni</div><div dir="auto"><br></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 17 Jul 2019 at 06:31, Pezzani, Rocco <<a href="mailto:Rocco.Pezzani@wuerth-phoenix.com">Rocco.Pezzani@wuerth-phoenix.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="IT" link="#0563C1" vlink="#954F72">
<div class="m_-600646399096516195WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi all,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">I have a 3-node DRBD Cluster that has suffered a Splitbrain. I recovered all resources except 1.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">For this resource, connections Node3-Node1 and Node3-Node2 are fine, but the connection Node1-Node2 is not working, as both sides see the other one as Standalone.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 3<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">[root@pbzne4demo-n3 ~]# drbdadm status influxdb<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">influxdb role:Primary<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> disk:UpToDate<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> pbzne4demo-n1.wp.lan role:Secondary<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> peer-disk:UpToDate<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> </span>pbzne4demo-n2.wp.lan role:Secondary<u></u><u></u></p>
<p class="MsoNormal"> <span lang="EN-US">peer-disk:UpToDate<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 2<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">[root@pbzne4demo-n2 ~]# drbdadm status influxdb<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">influxdb role:Secondary<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> disk:UpToDate<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> pbzne4demo-n1.wp.lan connection:StandAlone<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> pbzne4demo-n3.wp.lan role:Primary<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> peer-disk:UpToDate<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node1<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">[root@pbzne4demo-n1 ~]# drbdadm status influxdb<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">influxdb role:Secondary<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> disk:UpToDate<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> pbzne4demo-n2.wp.lan connection:StandAlone<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> pbzne4demo-n3.wp.lan role:Primary<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> peer-disk:UpToDate<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">I tried disconnecting and reconnecting the resource on every node, but the standalone always remain on both the same nodes.<u></u><u></u></span></p>
<p class="MsoNormal">What I tried:<u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-US">1. Disconnect from all nodes, connect on the primary node, connect --discard-my-data on both secondary nodes.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Standalone remains.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">/var/log/messages reports this on secondary nodes:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 2<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Handshake to peer 1 successful: Agreed network protocol version 114<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Starting ack_recv thread (from drbd_r_influxdb [7948])<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: incompatible discard-my-data settings<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( Connecting -> Disconnecting )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: error receiving P_PROTOCOL, e: -5 l: 1!<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: ack_receiver terminated<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Terminating ack_recv thread<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Connection closed<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( Disconnecting -> StandAlone )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Terminating receiver thread<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:10 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Preparing remote state change 271906619<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:10 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Committing remote state change 271906619 (primary_nodes=8)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 1<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( StandAlone -> Unconnected )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Starting receiver thread (from drbd_w_influxdb [6596])<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Unconnected -> Connecting )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: conn( StandAlone -> Unconnected )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Starting receiver thread (from drbd_w_influxdb [6596])<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: conn( Unconnected -> Connecting )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Handshake to peer 2 successful: Agreed network protocol version 114<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Starting ack_recv thread (from drbd_r_influxdb [30208])<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: incompatible discard-my-data settings<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Connecting -> Disconnecting )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: error receiving P_PROTOCOL, e: -5 l: 1!<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Handshake to peer 3 successful: Agreed network protocol version 114<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: ack_receiver terminated<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Terminating ack_recv thread<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n3.wp.lan: Starting ack_recv thread (from drbd_r_influxdb [30210])<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Connection closed<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Disconnecting -> StandAlone )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:16:09 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Terminating receiver thread<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">2. Tried using drbdadm adjust on both the secondary nodes<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Standalone remains.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">/var/log/messages reports this on secondary nodes:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 2<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:01 pbzne4demo-n2 systemd: Started Session 3741 of user root.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:03 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( StandAlone -> Unconnected )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:03 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Starting receiver thread (from drbd_w_influxdb [6563])<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:03 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( Unconnected -> Connecting )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Handshake to peer 1 successful: Agreed network protocol version 114<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Starting ack_recv thread (from drbd_r_influxdb [8026])<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: incompatible discard-my-data settings<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( Connecting -> Disconnecting )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: error receiving P_PROTOCOL, e: -5 l: 1!<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: ack_receiver terminated<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Terminating ack_recv thread<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Connection closed<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: conn( Disconnecting -> StandAlone )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n2 kernel: drbd influxdb pbzne4demo-n1.wp.lan: Terminating receiver thread<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">***Node 1<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:01 pbzne4demo-n1 systemd: Started Session 3754 of user root.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:15 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( StandAlone -> Unconnected )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:15 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Starting receiver thread (from drbd_w_influxdb [6596])<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:15 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Unconnected -> Connecting )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Handshake to peer 2 successful: Agreed network protocol version 114<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Starting ack_recv thread (from drbd_r_influxdb [30273])<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: incompatible discard-my-data settings<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Connecting -> Disconnecting )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: error receiving P_PROTOCOL, e: -5 l: 1!<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: ack_receiver terminated<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Terminating ack_recv thread<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Connection closed<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: conn( Disconnecting -> StandAlone )<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Jul 16 12:20:16 pbzne4demo-n1 kernel: drbd influxdb pbzne4demo-n2.wp.lan: Terminating receiver thread<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">3. Disconnect from all nodes, invalidate on both secondary nodes, connect primary node then connect on both secondary nodes<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Standalone remains.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">I think next steps might be working with metadata, but since I am a novice, I’m asking for suggestion. Please, can you help me in resolving this issue?<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">This is not a critical system, I can rebuild it, but I’d like to come up with a procedure and a better understanding of how to handle this kind of cases, because I’m sure I will encounter it again.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Best regards,<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt;background:white"><b><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#3d3d3d">Rocco Pezzani</span></b><span lang="EN-US"><u></u><u></u></span></p>
</div>
</div>
_______________________________________________<br>
Star us on GITHUB: <a href="https://github.com/LINBIT" rel="noreferrer" target="_blank">https://github.com/LINBIT</a><br>
drbd-user mailing list<br>
<a href="mailto:drbd-user@lists.linbit.com" target="_blank">drbd-user@lists.linbit.com</a><br>
<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" rel="noreferrer" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
</blockquote></div></div>