Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Try setting ping-timeout to 5
--
Adam Randall
http://www.xaren.net
AIM: blitz574
Twitter: @randalla0622
"To err is human... to really foul up requires the root password."
On Dec 5, 2015 5:23 AM, "Fabrizio Zelaya" <FZelaya at ta-petro.com> wrote:
> Hello Everyone.
>
> I have set up 2 servers with 2 drbd resources. Servers start fine and the
> connection is established and everything works fine for a while, but at
> some point (it could be hours but never more than 1 day) the drbd resources
> fall into a StandAlone status.
>
> On /var/log/messages I can see the following as the connection gets lost:
> Dec 3 13:56:20 host2 kernel: block drbd1: sock was shut down by peer
> Dec 3 13:56:20 host2 kernel: block drbd1: peer( Primary -> Unknown )
> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
> Dec 3 13:56:20 host2 kernel: block drbd1: short read expecting header on
> sock: r=0
> Dec 3 13:56:20 host2 kernel: block drbd1: new current UUID
> 0DA9D7241DAA80E7:C4DC8617C18594B1:FBC08C5F22389C79:FBBF8C5F22389C79
> Dec 3 13:56:20 host2 kernel: block drbd1: PingAck did not arrive in time.
> Dec 3 13:56:20 host2 kernel: block drbd1: asender terminated
> Dec 3 13:56:20 host2 kernel: block drbd1: Terminating drbd1_asender
> Dec 3 13:56:20 host2 kernel: block drbd1: Connection closed
> Dec 3 13:56:20 host2 kernel: block drbd1: conn( BrokenPipe -> Unconnected
> )
> Dec 3 13:56:20 host2 kernel: block drbd1: receiver terminated
> Dec 3 13:56:20 host2 kernel: block drbd1: Restarting drbd1_receiver
> Dec 3 13:56:20 host2 kernel: block drbd1: receiver (re)started
> Dec 3 13:56:20 host2 kernel: block drbd1: conn( Unconnected ->
> WFConnection )
> Dec 3 13:56:21 host2 kernel: block drbd1: Handshake successful: Agreed
> network protocol version 97
> Dec 3 13:56:21 host2 kernel: block drbd1: conn( WFConnection ->
> WFReportParams )
> Dec 3 13:56:21 host2 kernel: block drbd1: Starting asender thread (from
> drbd1_receiver [2860])
> Dec 3 13:56:21 host2 kernel: block drbd1: data-integrity-alg: <not-used>
> Dec 3 13:56:21 host2 kernel: block drbd1: drbd_sync_handshake:
> Dec 3 13:56:21 host2 kernel: block drbd1: self
> 0DA9D7241DAA80E7:C4DC8617C18594B1:FBC08C5F22389C79:FBBF8C5F22389C79 bits:0
> flags:0
> Dec 3 13:56:21 host2 kernel: block drbd1: peer
> 6FB7C41C2FB85275:C4DC8617C18594B1:FBC08C5F22389C79:FBBF8C5F22389C79 bits:0
> flags:0
> Dec 3 13:56:21 host2 kernel: block drbd1: uuid_compare()=100 by rule 90
> Dec 3 13:56:21 host2 kernel: block drbd1: helper command: /sbin/drbdadm
> initial-split-brain minor-1
> Dec 3 13:56:21 host2 kernel: block drbd1: helper command: /sbin/drbdadm
> initial-split-brain minor-1 exit code 0 (0x0)
> Dec 3 13:56:21 host2 kernel: block drbd1: Split-Brain detected but
> unresolved, dropping connection!
> Dec 3 13:56:21 host2 kernel: block drbd1: helper command: /sbin/drbdadm
> split-brain minor-1
> Dec 3 13:56:21 host2 notify-split-brain.sh[6540]: invoked for vms1
> Dec 3 13:56:21 host2 kernel: block drbd1: helper command: /sbin/drbdadm
> split-brain minor-1 exit code 0 (0x0)
> Dec 3 13:56:21 host2 kernel: block drbd1: conn( WFReportParams ->
> Disconnecting )
> Dec 3 13:56:21 host2 kernel: block drbd1: error receiving ReportState, l:
> 4!
> Dec 3 13:56:21 host2 kernel: block drbd1: asender terminated
> Dec 3 13:56:21 host2 kernel: block drbd1: Terminating drbd1_asender
> Dec 3 13:56:21 host2 kernel: block drbd1: Connection closed
> Dec 3 13:56:21 host2 kernel: block drbd1: conn( Disconnecting ->
> StandAlone )
> Dec 3 13:56:21 host2 kernel: block drbd1: receiver terminated
> Dec 3 13:56:21 host2 kernel: block drbd1: Terminating drbd1_receiver
>
> As you can see this is for one resource. If I do nothing (usually I
> restart drbd to recover) eventually the second resource fails too. The
> order in which the resources fail has been completely random
>
> The connection between the 2 servers is directly through a single cable
> (straight, not a crossover)
>
> I have monitored ping between the servers while it happens and I get no
> lost packages at all.
>
> I also have NIS (ypserv) configured and that connection doesn't get lost
> either.
>
> The connection doesn't re-establish by itself, the way to get it back has
> been to restart drbd service on both servers.
>
> Any Ideas of what might be causing this instability?
>
> Here are some general configuration info the might shine a bit of light on
> the issue
>
> # rpm -qa|grep drbd
> *drbd83-utils-8.3.16-1.el6.elrepo.x86_64*
> *kmod-drbd83-8.3.16-3.el6.elrepo.x86_64*
>
> # cat /etc/redhat-release
> *Scientific Linux release 6.7 (Carbon)*
>
>
> # drbdadm dump all
>
> *# /etc/drbd.conf*
> *common {*
> * protocol C;*
> * net {*
> * after-sb-0pri discard-zero-changes;*
> * after-sb-1pri discard-secondary;*
> * after-sb-2pri disconnect;*
> * }*
> * syncer {*
> * rate 33M;*
> * }*
> * handlers {*
> * pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";*
> * pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";*
> * local-io-error "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
> halt -f";*
> * split-brain "/usr/lib/drbd/notify-split-brain.sh root";*
> * out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";*
> * }*
> *}*
>
> *# resource vms1 on host2: not ignored, not stacked*
> *resource vms1 {*
> * on host1 {*
> * device /dev/drbd1 minor 1;*
> * disk /dev/sda2;*
> * address ipv4 192.168.100.60:7789
> <http://192.168.100.60:7789>;*
> * meta-disk internal;*
> * }*
> * on host2 {*
> * device /dev/drbd1 minor 1;*
> * disk /dev/sda2;*
> * address ipv4 192.168.100.61:7789
> <http://192.168.100.61:7789>;*
> * meta-disk internal;*
> * }*
> * net {*
> * allow-two-primaries;*
> * }*
> * startup {*
> * become-primary-on both;*
> * }*
> *}*
>
> *# resource vms2 on host2: not ignored, not stacked*
> *resource vms2 {*
> * on host1 {*
> * device /dev/drbd2 minor 2;*
> * disk /dev/sda3;*
> * address ipv4 192.168.100.60:7790
> <http://192.168.100.60:7790>;*
> * meta-disk internal;*
> * }*
> * on host2 {*
> * device /dev/drbd2 minor 2;*
> * disk /dev/sda3;*
> * address ipv4 192.168.100.61:7790
> <http://192.168.100.61:7790>;*
> * meta-disk internal;*
> * }*
> * net {*
> * allow-two-primaries;*
> * }*
> * startup {*
> * become-primary-on both;*
> * }*
>
>
> Thank you in advance for your help
>
> Fabrizio Zelaya
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20151205/9f33f193/attachment.htm>