[DRBD-user] Replication problems constants with DRBD 8.3.10

cesar brain at click.com.py
Sat Jun 15 10:29:40 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello everyone

*Please Urgent, my servers are in production*

I am in a serious problem and need help

*My my scenario*
- I have two workstations ASUS P8H77-M PRO with Intel core I7, Proxmox VE
2.3, DRBD 8.3.10, LVM on top of DRBD
- 2 NICs Realtek RTL8111/8168 PCI-E of 1 Gb/s in bond round robin only for
use with DRBD

And after awhile it shows me this:

shell#cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root at sighted,
2012-10-09 12:47:51
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:237256 nr:307093 dw:307093 dr:690264 al:0 bm:321 lo:0 pe:0 ua:0 ap:0
ep:1 wo:b oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:467984 dw:467984 dr:537932 al:0 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:0

*This is my configuration:*

File global_common.conf:
global { usage-count no;
}

common {
        protocol C;

        handlers {
                pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
                pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";
                split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
        }

        startup {
        }

        disk { on-io-error detach;
        }

        net { sndbuf-size 0; no-tcp-cork; unplug-watermark 16; max-buffers
8000; max-epoch-size 8000;
                data-integrity-alg sha1;
        }

        syncer { rate 75M; al-extents 3389; cpu-mask 0; verify-alg "sha1";
        }
}

*File r0.res:*
resource r0 {
  protocol C;
  startup {
    wfc-timeout 15;
    degr-wfc-timeout 60;
    become-primary-on both;
  }
  net {
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
  }
  on kvm5 {
    device /dev/drbd0;
    disk /dev/sda3;
    address 10.2.2.50:7788;
    meta-disk internal;
  }
  on kvm6 {
    device /dev/drbd0;
    disk /dev/sda3;
    address 10.2.2.51:7788;
    meta-disk internal;
  }
}

*File r1.res:*
resource r1 {
  protocol C;
  startup {
    wfc-timeout 15;
    degr-wfc-timeout 60;
    become-primary-on both;
  }
  net {
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
  }
  on kvm5 {
    device /dev/drbd1;
    disk /dev/sdb3;
    address 10.2.2.50:7789;
    meta-disk internal;
  }
  on kvm6 {
    device /dev/drbd1;
    disk /dev/sdb3;
    address 10.2.2.51:7789;
    meta-disk internal;
  }
}

*Note:*
I use on the directive net "data-integrity-alg sha1"; because for me is very
important the data

*This is my logs:*

*Log in Node A:*
Jun 14 08:07:28 kvm5 kernel: dlm: connecting to 4
Jun 14 08:50:12 kvm5 kernel: block drbd0: Digest mismatch, buffer modified
by upper layers during write: 21158352s +4096
Jun 14 08:50:12 kvm5 kernel: block drbd0: sock was reset by peer
Jun 14 08:50:12 kvm5 kernel: block drbd0: peer( Primary -> Unknown ) conn(
Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) 
Jun 14 08:50:12 kvm5 kernel: block drbd0: short read expecting header on
sock: r=-104
Jun 14 08:50:12 kvm5 kernel: block drbd0: meta connection shut down by peer.
Jun 14 08:50:12 kvm5 kernel: block drbd0: new current UUID
76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D
Jun 14 08:50:12 kvm5 kernel: block drbd0: asender terminated
Jun 14 08:50:12 kvm5 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:12 kvm5 kernel: block drbd0: Connection closed
Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( BrokenPipe -> Unconnected ) 
Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver terminated
Jun 14 08:50:12 kvm5 kernel: block drbd0: Restarting receiver thread
Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver (re)started
Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( Unconnected -> WFConnection
) 
Jun 14 08:50:13 kvm5 kernel: block drbd0: Handshake successful: Agreed
network protocol version 96
Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFConnection ->
WFReportParams ) 
Jun 14 08:50:13 kvm5 kernel: block drbd0: Starting asender thread (from
drbd0_receiver [1847])
Jun 14 08:50:13 kvm5 kernel: block drbd0: data-integrity-alg: sha1
Jun 14 08:50:13 kvm5 kernel: block drbd0: drbd_sync_handshake:
Jun 14 08:50:13 kvm5 kernel: block drbd0: self
76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99
flags:0
Jun 14 08:50:13 kvm5 kernel: block drbd0: peer
CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0
flags:0
Jun 14 08:50:13 kvm5 kernel: block drbd0: uuid_compare()=100 by rule 90
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm5 kernel: block drbd0: Split-Brain detected but
unresolved, dropping connection!
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFReportParams ->
Disconnecting ) 
Jun 14 08:50:13 kvm5 kernel: block drbd0: error receiving ReportState, l: 4!
Jun 14 08:50:13 kvm5 kernel: block drbd0: asender terminated
Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:13 kvm5 kernel: block drbd0: Connection closed
Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( Disconnecting -> StandAlone
) 
Jun 14 08:50:13 kvm5 kernel: block drbd0: receiver terminated
Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating receiver thread

*Log in node B:*
Jun 14 08:07:28 kvm6 kernel: dlm: Using TCP for communications
Jun 14 08:07:28 kvm6 kernel: dlm: got connection from 3
Jun 14 08:50:12 kvm6 kernel: block drbd0: Digest integrity check FAILED:
21158352s +4096
Jun 14 08:50:12 kvm6 kernel: block drbd0: error receiving Data, l: 4140!
Jun 14 08:50:12 kvm6 kernel: block drbd0: peer( Primary -> Unknown ) conn(
Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) 
Jun 14 08:50:12 kvm6 kernel: block drbd0: new current UUID
CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D
Jun 14 08:50:12 kvm6 kernel: block drbd0: asender terminated
Jun 14 08:50:12 kvm6 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:12 kvm6 kernel: block drbd0: Connection closed
Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( ProtocolError -> Unconnected
) 
Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver terminated
Jun 14 08:50:12 kvm6 kernel: block drbd0: Restarting receiver thread
Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver (re)started
Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( Unconnected -> WFConnection
) 
Jun 14 08:50:13 kvm6 kernel: block drbd0: Handshake successful: Agreed
network protocol version 96
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFConnection ->
WFReportParams ) 
Jun 14 08:50:13 kvm6 kernel: block drbd0: Starting asender thread (from
drbd0_receiver [1857])
Jun 14 08:50:13 kvm6 kernel: block drbd0: data-integrity-alg: sha1
Jun 14 08:50:13 kvm6 kernel: block drbd0: drbd_sync_handshake:
Jun 14 08:50:13 kvm6 kernel: block drbd0: self
CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0
flags:0
Jun 14 08:50:13 kvm6 kernel: block drbd0: peer
76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99
flags:0
Jun 14 08:50:13 kvm6 kernel: block drbd0: uuid_compare()=100 by rule 90
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm6 kernel: block drbd0: Split-Brain detected but
unresolved, dropping connection!
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0
Jun 14 08:50:13 kvm6 kernel: block drbd0: meta connection shut down by peer.
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFReportParams ->
NetworkFailure ) 
Jun 14 08:50:13 kvm6 kernel: block drbd0: asender terminated
Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( NetworkFailure ->
Disconnecting ) 
Jun 14 08:50:13 kvm6 kernel: block drbd0: error receiving ReportState, l: 4!
Jun 14 08:50:13 kvm6 kernel: block drbd0: Connection closed
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( Disconnecting -> StandAlone
) 
Jun 14 08:50:13 kvm6 kernel: block drbd0: receiver terminated
Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating receiver thread



I will be extremely grateful to anyone who can help me

Best regards
Cesar




--
View this message in context: http://drbd.10923.n7.nabble.com/Replication-problems-constants-with-DRBD-8-3-10-tp17896.html
Sent from the DRBD - User mailing list archive at Nabble.com.



More information about the drbd-user mailing list