Hi,<div><br></div><div>I'm testing a DRBD+MySQL environment in production, but after a while the second node always gets disconnected, and I have no idea if it's a hardware problem or missconfiguration.</div><div>The second node is not even mounted. I'm just replicating the data, not using it.</div>
<div><br></div><div>The error is on the end of the message. Here is my conf:</div><div><br></div><div><br></div><div><div>resource r0 {</div><div> meta-disk internal;</div><div> device /dev/drbd0;</div><div>
disk /dev/sda4;</div><div> </div><div> syncer { rate 33M; }</div><div><br></div><div> handlers {</div><div> split-brain "/etc/init.d/mysql stop";</div><div> }</div><div><br></div>
<div> net { </div><div> allow-two-primaries; </div><div> after-sb-0pri discard-zero-changes;</div><div> after-sb-1pri discard-secondary;</div><div> after-sb-2pri disconnect;</div>
<div> data-integrity-alg crc32c;</div><div> ko-count 4;</div><div> }</div><div><br></div><div> startup { become-primary-on both; }</div><div> </div><div> on stewart { address <a href="http://192.168.0.1:7789">192.168.0.1:7789</a>; }</div>
<div> on prost { address <a href="http://192.168.0.2:7789">192.168.0.2:7789</a>; }</div><div>}</div></div><div><br></div><div><br></div><div>Is there something wrong in my conf? Should I change something?</div><div>
Another problem is that after the second node gets disconnected, I have to reconnect it my hand my running "drbdadm connect r0". Aparently after running it the nodes get quickly re-synced (less then a minute), and the previously disconnected node starts as Secondary, so I had to run "drbdadm primary r0".</div>
<div><br></div><div>Both nodes are Dell PowerEdge R710 with 48GB of ram, running RHEL 5.6 and DRBD 8.3.10 (from ElRepo).</div><div><br></div><div>Am I missing something here?</div><div> </div><div><br></div><div>Thanks for any help!</div>
<div><br></div><div>Regards,</div><div>Thiago Vinhas</div><div><div>block drbd0: Digest integrity check FAILED: 63266864s +4096</div><div>block drbd0: error receiving Data, l: 4136!</div><div>block drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) </div>
<div>block drbd0: new current UUID 66983E6BBEE733F5:6157ABDB87926AA5:0001000000000001:5905CD0F6B61A6A9</div><div>block drbd0: asender terminated</div><div>block drbd0: Terminating asender thread</div><div>block drbd0: Connection closed</div>
<div>block drbd0: conn( ProtocolError -> Unconnected ) </div><div>block drbd0: receiver terminated</div><div>block drbd0: Restarting receiver thread</div><div>block drbd0: receiver (re)started</div><div>block drbd0: conn( Unconnected -> WFConnection ) </div>
<div>block drbd0: Handshake successful: Agreed network protocol version 96</div><div>block drbd0: conn( WFConnection -> WFReportParams ) </div><div>block drbd0: Starting asender thread (from drbd0_receiver [7794])</div>
<div>block drbd0: data-integrity-alg: md5</div><div>block drbd0: drbd_sync_handshake:</div><div>block drbd0: self 66983E6BBEE733F5:6157ABDB87926AA5:0001000000000001:5905CD0F6B61A6A9 bits:0 flags:0</div><div>block drbd0: peer 4C9FC71A2D13AF9F:6157ABDB87926AA5:0001000000000000:5905CD0F6B61A6A9 bits:40 flags:0</div>
<div>block drbd0: uuid_compare()=100 by rule 90</div><div>block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0</div><div>block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)</div>
<div>block drbd0: Split-Brain detected but unresolved, dropping connection!</div><div>block drbd0: helper command: /sbin/drbdadm split-brain minor-0</div><div>block drbd0: meta connection shut down by peer.</div><div>block drbd0: conn( WFReportParams -> NetworkFailure ) </div>
<div>block drbd0: asender terminated</div><div>block drbd0: Terminating asender thread</div><div>block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)</div><div>block drbd0: conn( NetworkFailure -> Disconnecting ) </div>
<div>block drbd0: error receiving ReportState, l: 4!</div><div>block drbd0: Connection closed</div><div>block drbd0: conn( Disconnecting -> StandAlone ) </div><div>block drbd0: receiver terminated</div><div>block drbd0: Terminating receiver thread</div>
</div><div><br clear="all">Abs,<br>Thiago Vinhas<br>
</div>