<p dir="ltr">Try setting ping-timeout to 5</p>
<p dir="ltr">-- <br>
Adam Randall<br>
<a href="http://www.xaren.net">http://www.xaren.net</a><br>
AIM: blitz574<br>
Twitter: @randalla0622</p>
<p dir="ltr">"To err is human... to really foul up requires the root password."</p>
<div class="gmail_quote">On Dec 5, 2015 5:23 AM, "Fabrizio Zelaya" <<a href="mailto:FZelaya@ta-petro.com">FZelaya@ta-petro.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><font size="2" face="sans-serif">Hello Everyone.</font>
<br>
<br><font size="2" face="sans-serif">I have set up 2 servers with 2 drbd
resources. Servers start fine and the connection is established and everything
works fine for a while, but at some point (it could be hours but never
more than 1 day) the drbd resources fall into a StandAlone status.</font>
<br>
<br><font size="2" face="sans-serif">On /var/log/messages I can see the following
as the connection gets lost:</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: sock was shut down by peer</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe )
pdsk( UpToDate -> DUnknown ) </font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: short read expecting header on sock: r=0</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: new current UUID 0DA9D7241DAA80E7:C4DC8617C18594B1:FBC08C5F22389C79:FBBF8C5F22389C79</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: PingAck did not arrive in time.</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: asender terminated</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: Terminating drbd1_asender</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: Connection closed</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: conn( BrokenPipe -> Unconnected ) </font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: receiver terminated</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: Restarting drbd1_receiver</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: receiver (re)started</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:20 host2 kernel: block
drbd1: conn( Unconnected -> WFConnection ) </font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: Handshake successful: Agreed network protocol version 97</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: conn( WFConnection -> WFReportParams ) </font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: Starting asender thread (from drbd1_receiver [2860])</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: data-integrity-alg: <not-used></font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: drbd_sync_handshake:</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: self 0DA9D7241DAA80E7:C4DC8617C18594B1:FBC08C5F22389C79:FBBF8C5F22389C79
bits:0 flags:0</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: peer 6FB7C41C2FB85275:C4DC8617C18594B1:FBC08C5F22389C79:FBBF8C5F22389C79
bits:0 flags:0</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: uuid_compare()=100 by rule 90</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 exit code
0 (0x0)</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: Split-Brain detected but unresolved, dropping connection!</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: helper command: /sbin/drbdadm split-brain minor-1</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 notify-split-brain.sh[6540]:
invoked for vms1</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: conn( WFReportParams -> Disconnecting ) </font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: error receiving ReportState, l: 4!</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: asender terminated</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: Terminating drbd1_asender</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: Connection closed</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: conn( Disconnecting -> StandAlone ) </font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: receiver terminated</font>
<br><font size="2" face="sans-serif">Dec 3 13:56:21 host2 kernel: block
drbd1: Terminating drbd1_receiver</font>
<br>
<br><font size="2" face="sans-serif">As you can see this is for one resource.
If I do nothing (usually I restart drbd to recover) eventually the second
resource fails too. The order in which the resources fail has been completely
random</font>
<br>
<br><font size="2" face="sans-serif">The connection between the 2 servers
is directly through a single cable (straight, not a crossover) </font>
<br>
<br><font size="2" face="sans-serif">I have monitored ping between the servers
while it happens and I get no lost packages at all. </font>
<br>
<br><font size="2" face="sans-serif">I also have NIS (ypserv) configured
and that connection doesn't get lost either.</font>
<br>
<br><font size="2" face="sans-serif">The connection doesn't re-establish
by itself, the way to get it back has been to restart drbd service on both
servers.</font>
<br>
<br><font size="2" face="sans-serif">Any Ideas of what might be causing this
instability?</font>
<br>
<br><font size="2" face="sans-serif">Here are some general configuration
info the might shine a bit of light on the issue </font>
<br>
<br><font size="2" face="sans-serif"> # rpm -qa|grep drbd</font>
<br><font size="2" face="sans-serif"><i>drbd83-utils-8.3.16-1.el6.elrepo.x86_64</i></font>
<br><font size="2" face="sans-serif"><i>kmod-drbd83-8.3.16-3.el6.elrepo.x86_64</i></font>
<br>
<br><font size="2" face="sans-serif"># cat /etc/redhat-release </font>
<br><font size="2" face="sans-serif"><i>Scientific Linux release 6.7 (Carbon)</i></font>
<br>
<br>
<br><font size="2" face="sans-serif"># drbdadm dump all</font>
<br>
<br><font size="2" face="sans-serif"><i># /etc/drbd.conf</i></font>
<br><font size="2" face="sans-serif"><i>common {</i></font>
<br><font size="2" face="sans-serif"><i> protocol
C;</i></font>
<br><font size="2" face="sans-serif"><i> net {</i></font>
<br><font size="2" face="sans-serif"><i> after-sb-0pri
discard-zero-changes;</i></font>
<br><font size="2" face="sans-serif"><i> after-sb-1pri
discard-secondary;</i></font>
<br><font size="2" face="sans-serif"><i> after-sb-2pri
disconnect;</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br><font size="2" face="sans-serif"><i> syncer {</i></font>
<br><font size="2" face="sans-serif"><i> rate
33M;</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br><font size="2" face="sans-serif"><i> handlers {</i></font>
<br><font size="2" face="sans-serif"><i> pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh;
echo b > /proc/sysrq-trigger ; reboot -f";</i></font>
<br><font size="2" face="sans-serif"><i> pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh;
echo b > /proc/sysrq-trigger ; reboot -f";</i></font>
<br><font size="2" face="sans-serif"><i> local-io-error
"/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh;
echo o > /proc/sysrq-trigger ; halt -f";</i></font>
<br><font size="2" face="sans-serif"><i> split-brain
"/usr/lib/drbd/notify-split-brain.sh root";</i></font>
<br><font size="2" face="sans-serif"><i> out-of-sync
"/usr/lib/drbd/notify-out-of-sync.sh root";</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br><font size="2" face="sans-serif"><i>}</i></font>
<br>
<br><font size="2" face="sans-serif"><i># resource vms1 on host2: not ignored,
not stacked</i></font>
<br><font size="2" face="sans-serif"><i>resource vms1 {</i></font>
<br><font size="2" face="sans-serif"><i> on host1 {</i></font>
<br><font size="2" face="sans-serif"><i> device
/dev/drbd1 minor 1;</i></font>
<br><font size="2" face="sans-serif"><i> disk
/dev/sda2;</i></font>
<br><font size="2" face="sans-serif"><i> address
ipv4 <a href="http://192.168.100.60:7789" target="_blank">192.168.100.60:7789</a>;</i></font>
<br><font size="2" face="sans-serif"><i> meta-disk
internal;</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br><font size="2" face="sans-serif"><i> on host2 {</i></font>
<br><font size="2" face="sans-serif"><i> device
/dev/drbd1 minor 1;</i></font>
<br><font size="2" face="sans-serif"><i> disk
/dev/sda2;</i></font>
<br><font size="2" face="sans-serif"><i> address
ipv4 <a href="http://192.168.100.61:7789" target="_blank">192.168.100.61:7789</a>;</i></font>
<br><font size="2" face="sans-serif"><i> meta-disk
internal;</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br><font size="2" face="sans-serif"><i> net {</i></font>
<br><font size="2" face="sans-serif"><i> allow-two-primaries;</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br><font size="2" face="sans-serif"><i> startup {</i></font>
<br><font size="2" face="sans-serif"><i> become-primary-on
both;</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br><font size="2" face="sans-serif"><i>}</i></font>
<br>
<br><font size="2" face="sans-serif"><i># resource vms2 on host2: not ignored,
not stacked</i></font>
<br><font size="2" face="sans-serif"><i>resource vms2 {</i></font>
<br><font size="2" face="sans-serif"><i> on host1 {</i></font>
<br><font size="2" face="sans-serif"><i> device
/dev/drbd2 minor 2;</i></font>
<br><font size="2" face="sans-serif"><i> disk
/dev/sda3;</i></font>
<br><font size="2" face="sans-serif"><i> address
ipv4 <a href="http://192.168.100.60:7790" target="_blank">192.168.100.60:7790</a>;</i></font>
<br><font size="2" face="sans-serif"><i> meta-disk
internal;</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br><font size="2" face="sans-serif"><i> on host2 {</i></font>
<br><font size="2" face="sans-serif"><i> device
/dev/drbd2 minor 2;</i></font>
<br><font size="2" face="sans-serif"><i> disk
/dev/sda3;</i></font>
<br><font size="2" face="sans-serif"><i> address
ipv4 <a href="http://192.168.100.61:7790" target="_blank">192.168.100.61:7790</a>;</i></font>
<br><font size="2" face="sans-serif"><i> meta-disk
internal;</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br><font size="2" face="sans-serif"><i> net {</i></font>
<br><font size="2" face="sans-serif"><i> allow-two-primaries;</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br><font size="2" face="sans-serif"><i> startup {</i></font>
<br><font size="2" face="sans-serif"><i> become-primary-on
both;</i></font>
<br><font size="2" face="sans-serif"><i> }</i></font>
<br>
<br>
<br><font size="2" face="sans-serif">Thank you in advance for your help</font>
<br>
<br><font size="2" face="sans-serif">Fabrizio Zelaya </font>
<br><br>_______________________________________________<br>
drbd-user mailing list<br>
<a href="mailto:drbd-user@lists.linbit.com">drbd-user@lists.linbit.com</a><br>
<a href="http://lists.linbit.com/mailman/listinfo/drbd-user" rel="noreferrer" target="_blank">http://lists.linbit.com/mailman/listinfo/drbd-user</a><br>
<br></blockquote></div>