Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Try setting ping-timeout to 5 -- Adam Randall http://www.xaren.net AIM: blitz574 Twitter: @randalla0622 "To err is human... to really foul up requires the root password." On Dec 5, 2015 5:23 AM, "Fabrizio Zelaya" <FZelaya at ta-petro.com> wrote: > Hello Everyone. > > I have set up 2 servers with 2 drbd resources. Servers start fine and the > connection is established and everything works fine for a while, but at > some point (it could be hours but never more than 1 day) the drbd resources > fall into a StandAlone status. > > On /var/log/messages I can see the following as the connection gets lost: > Dec 3 13:56:20 host2 kernel: block drbd1: sock was shut down by peer > Dec 3 13:56:20 host2 kernel: block drbd1: peer( Primary -> Unknown ) > conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) > Dec 3 13:56:20 host2 kernel: block drbd1: short read expecting header on > sock: r=0 > Dec 3 13:56:20 host2 kernel: block drbd1: new current UUID > 0DA9D7241DAA80E7:C4DC8617C18594B1:FBC08C5F22389C79:FBBF8C5F22389C79 > Dec 3 13:56:20 host2 kernel: block drbd1: PingAck did not arrive in time. > Dec 3 13:56:20 host2 kernel: block drbd1: asender terminated > Dec 3 13:56:20 host2 kernel: block drbd1: Terminating drbd1_asender > Dec 3 13:56:20 host2 kernel: block drbd1: Connection closed > Dec 3 13:56:20 host2 kernel: block drbd1: conn( BrokenPipe -> Unconnected > ) > Dec 3 13:56:20 host2 kernel: block drbd1: receiver terminated > Dec 3 13:56:20 host2 kernel: block drbd1: Restarting drbd1_receiver > Dec 3 13:56:20 host2 kernel: block drbd1: receiver (re)started > Dec 3 13:56:20 host2 kernel: block drbd1: conn( Unconnected -> > WFConnection ) > Dec 3 13:56:21 host2 kernel: block drbd1: Handshake successful: Agreed > network protocol version 97 > Dec 3 13:56:21 host2 kernel: block drbd1: conn( WFConnection -> > WFReportParams ) > Dec 3 13:56:21 host2 kernel: block drbd1: Starting asender thread (from > drbd1_receiver [2860]) > Dec 3 13:56:21 host2 kernel: block drbd1: data-integrity-alg: <not-used> > Dec 3 13:56:21 host2 kernel: block drbd1: drbd_sync_handshake: > Dec 3 13:56:21 host2 kernel: block drbd1: self > 0DA9D7241DAA80E7:C4DC8617C18594B1:FBC08C5F22389C79:FBBF8C5F22389C79 bits:0 > flags:0 > Dec 3 13:56:21 host2 kernel: block drbd1: peer > 6FB7C41C2FB85275:C4DC8617C18594B1:FBC08C5F22389C79:FBBF8C5F22389C79 bits:0 > flags:0 > Dec 3 13:56:21 host2 kernel: block drbd1: uuid_compare()=100 by rule 90 > Dec 3 13:56:21 host2 kernel: block drbd1: helper command: /sbin/drbdadm > initial-split-brain minor-1 > Dec 3 13:56:21 host2 kernel: block drbd1: helper command: /sbin/drbdadm > initial-split-brain minor-1 exit code 0 (0x0) > Dec 3 13:56:21 host2 kernel: block drbd1: Split-Brain detected but > unresolved, dropping connection! > Dec 3 13:56:21 host2 kernel: block drbd1: helper command: /sbin/drbdadm > split-brain minor-1 > Dec 3 13:56:21 host2 notify-split-brain.sh[6540]: invoked for vms1 > Dec 3 13:56:21 host2 kernel: block drbd1: helper command: /sbin/drbdadm > split-brain minor-1 exit code 0 (0x0) > Dec 3 13:56:21 host2 kernel: block drbd1: conn( WFReportParams -> > Disconnecting ) > Dec 3 13:56:21 host2 kernel: block drbd1: error receiving ReportState, l: > 4! > Dec 3 13:56:21 host2 kernel: block drbd1: asender terminated > Dec 3 13:56:21 host2 kernel: block drbd1: Terminating drbd1_asender > Dec 3 13:56:21 host2 kernel: block drbd1: Connection closed > Dec 3 13:56:21 host2 kernel: block drbd1: conn( Disconnecting -> > StandAlone ) > Dec 3 13:56:21 host2 kernel: block drbd1: receiver terminated > Dec 3 13:56:21 host2 kernel: block drbd1: Terminating drbd1_receiver > > As you can see this is for one resource. If I do nothing (usually I > restart drbd to recover) eventually the second resource fails too. The > order in which the resources fail has been completely random > > The connection between the 2 servers is directly through a single cable > (straight, not a crossover) > > I have monitored ping between the servers while it happens and I get no > lost packages at all. > > I also have NIS (ypserv) configured and that connection doesn't get lost > either. > > The connection doesn't re-establish by itself, the way to get it back has > been to restart drbd service on both servers. > > Any Ideas of what might be causing this instability? > > Here are some general configuration info the might shine a bit of light on > the issue > > # rpm -qa|grep drbd > *drbd83-utils-8.3.16-1.el6.elrepo.x86_64* > *kmod-drbd83-8.3.16-3.el6.elrepo.x86_64* > > # cat /etc/redhat-release > *Scientific Linux release 6.7 (Carbon)* > > > # drbdadm dump all > > *# /etc/drbd.conf* > *common {* > * protocol C;* > * net {* > * after-sb-0pri discard-zero-changes;* > * after-sb-1pri discard-secondary;* > * after-sb-2pri disconnect;* > * }* > * syncer {* > * rate 33M;* > * }* > * handlers {* > * pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot -f";* > * pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot -f";* > * local-io-error "/usr/lib/drbd/notify-io-error.sh; > /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; > halt -f";* > * split-brain "/usr/lib/drbd/notify-split-brain.sh root";* > * out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";* > * }* > *}* > > *# resource vms1 on host2: not ignored, not stacked* > *resource vms1 {* > * on host1 {* > * device /dev/drbd1 minor 1;* > * disk /dev/sda2;* > * address ipv4 192.168.100.60:7789 > <http://192.168.100.60:7789>;* > * meta-disk internal;* > * }* > * on host2 {* > * device /dev/drbd1 minor 1;* > * disk /dev/sda2;* > * address ipv4 192.168.100.61:7789 > <http://192.168.100.61:7789>;* > * meta-disk internal;* > * }* > * net {* > * allow-two-primaries;* > * }* > * startup {* > * become-primary-on both;* > * }* > *}* > > *# resource vms2 on host2: not ignored, not stacked* > *resource vms2 {* > * on host1 {* > * device /dev/drbd2 minor 2;* > * disk /dev/sda3;* > * address ipv4 192.168.100.60:7790 > <http://192.168.100.60:7790>;* > * meta-disk internal;* > * }* > * on host2 {* > * device /dev/drbd2 minor 2;* > * disk /dev/sda3;* > * address ipv4 192.168.100.61:7790 > <http://192.168.100.61:7790>;* > * meta-disk internal;* > * }* > * net {* > * allow-two-primaries;* > * }* > * startup {* > * become-primary-on both;* > * }* > > > Thank you in advance for your help > > Fabrizio Zelaya > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20151205/9f33f193/attachment.htm>