[DRBD-user] Replication problems constants with DRBD 8.3.10

Digimer lists at alteeve.ca
Sat Jun 15 21:18:36 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


First thing that jumps out at me is that the round-robin bonding is not 
supported. Only mode=1 (Active/Passive) is. Secondly, you do not have 
fencing, so when the network error occurred, you got a split brain;

 > Jun 14 08:50:13 kvm5 kernel: block drbd0: Split-Brain detected but
 > unresolved, dropping connection!

So, switch your bonding to mode=1 and then follow the instructions to 
resolve a split-brain.

http://www.drbd.org/users-guide-8.3/s-resolve-split-brain.html

Once this is sorted out, configure and use actual fencing (stonith in 
pacemaker terms).

digimer

On 06/15/2013 04:29 AM, cesar wrote:
> Hello everyone
>
> *Please Urgent, my servers are in production*
>
> I am in a serious problem and need help
>
> *My my scenario*
> - I have two workstations ASUS P8H77-M PRO with Intel core I7, Proxmox VE
> 2.3, DRBD 8.3.10, LVM on top of DRBD
> - 2 NICs Realtek RTL8111/8168 PCI-E of 1 Gb/s in bond round robin only for
> use with DRBD
>
> And after awhile it shows me this:
>
> shell#cat /proc/drbd
> version: 8.3.13 (api:88/proto:86-96)
> GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root at sighted,
> 2012-10-09 12:47:51
>   0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>      ns:237256 nr:307093 dw:307093 dr:690264 al:0 bm:321 lo:0 pe:0 ua:0 ap:0
> ep:1 wo:b oos:0
>   1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
>      ns:0 nr:467984 dw:467984 dr:537932 al:0 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1
> wo:b oos:0
>
> *This is my configuration:*
>
> File global_common.conf:
> global { usage-count no;
> }
>
> common {
>          protocol C;
>
>          handlers {
>                  pri-on-incon-degr
> "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
>                  pri-lost-after-sb
> "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
>                  local-io-error "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
> halt -f";
>                  split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>                  out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
>          }
>
>          startup {
>          }
>
>          disk { on-io-error detach;
>          }
>
>          net { sndbuf-size 0; no-tcp-cork; unplug-watermark 16; max-buffers
> 8000; max-epoch-size 8000;
>                  data-integrity-alg sha1;
>          }
>
>          syncer { rate 75M; al-extents 3389; cpu-mask 0; verify-alg "sha1";
>          }
> }
>
> *File r0.res:*
> resource r0 {
>    protocol C;
>    startup {
>      wfc-timeout 15;
>      degr-wfc-timeout 60;
>      become-primary-on both;
>    }
>    net {
>      allow-two-primaries;
>      after-sb-0pri discard-zero-changes;
>      after-sb-1pri discard-secondary;
>      after-sb-2pri disconnect;
>    }
>    on kvm5 {
>      device /dev/drbd0;
>      disk /dev/sda3;
>      address 10.2.2.50:7788;
>      meta-disk internal;
>    }
>    on kvm6 {
>      device /dev/drbd0;
>      disk /dev/sda3;
>      address 10.2.2.51:7788;
>      meta-disk internal;
>    }
> }
>
> *File r1.res:*
> resource r1 {
>    protocol C;
>    startup {
>      wfc-timeout 15;
>      degr-wfc-timeout 60;
>      become-primary-on both;
>    }
>    net {
>      allow-two-primaries;
>      after-sb-0pri discard-zero-changes;
>      after-sb-1pri discard-secondary;
>      after-sb-2pri disconnect;
>    }
>    on kvm5 {
>      device /dev/drbd1;
>      disk /dev/sdb3;
>      address 10.2.2.50:7789;
>      meta-disk internal;
>    }
>    on kvm6 {
>      device /dev/drbd1;
>      disk /dev/sdb3;
>      address 10.2.2.51:7789;
>      meta-disk internal;
>    }
> }
>
> *Note:*
> I use on the directive net "data-integrity-alg sha1"; because for me is very
> important the data
>
> *This is my logs:*
>
> *Log in Node A:*
> Jun 14 08:07:28 kvm5 kernel: dlm: connecting to 4
> Jun 14 08:50:12 kvm5 kernel: block drbd0: Digest mismatch, buffer modified
> by upper layers during write: 21158352s +4096
> Jun 14 08:50:12 kvm5 kernel: block drbd0: sock was reset by peer
> Jun 14 08:50:12 kvm5 kernel: block drbd0: peer( Primary -> Unknown ) conn(
> Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
> Jun 14 08:50:12 kvm5 kernel: block drbd0: short read expecting header on
> sock: r=-104
> Jun 14 08:50:12 kvm5 kernel: block drbd0: meta connection shut down by peer.
> Jun 14 08:50:12 kvm5 kernel: block drbd0: new current UUID
> 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D
> Jun 14 08:50:12 kvm5 kernel: block drbd0: asender terminated
> Jun 14 08:50:12 kvm5 kernel: block drbd0: Terminating asender thread
> Jun 14 08:50:12 kvm5 kernel: block drbd0: Connection closed
> Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( BrokenPipe -> Unconnected )
> Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver terminated
> Jun 14 08:50:12 kvm5 kernel: block drbd0: Restarting receiver thread
> Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver (re)started
> Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( Unconnected -> WFConnection
> )
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Handshake successful: Agreed
> network protocol version 96
> Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Starting asender thread (from
> drbd0_receiver [1847])
> Jun 14 08:50:13 kvm5 kernel: block drbd0: data-integrity-alg: sha1
> Jun 14 08:50:13 kvm5 kernel: block drbd0: drbd_sync_handshake:
> Jun 14 08:50:13 kvm5 kernel: block drbd0: self
> 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99
> flags:0
> Jun 14 08:50:13 kvm5 kernel: block drbd0: peer
> CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0
> flags:0
> Jun 14 08:50:13 kvm5 kernel: block drbd0: uuid_compare()=100 by rule 90
> Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0
> Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0 exit code 0 (0x0)
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Split-Brain detected but
> unresolved, dropping connection!
> Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0
> Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0 exit code 0 (0x0)
> Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFReportParams ->
> Disconnecting )
> Jun 14 08:50:13 kvm5 kernel: block drbd0: error receiving ReportState, l: 4!
> Jun 14 08:50:13 kvm5 kernel: block drbd0: asender terminated
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating asender thread
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Connection closed
> Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( Disconnecting -> StandAlone
> )
> Jun 14 08:50:13 kvm5 kernel: block drbd0: receiver terminated
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating receiver thread
>
> *Log in node B:*
> Jun 14 08:07:28 kvm6 kernel: dlm: Using TCP for communications
> Jun 14 08:07:28 kvm6 kernel: dlm: got connection from 3
> Jun 14 08:50:12 kvm6 kernel: block drbd0: Digest integrity check FAILED:
> 21158352s +4096
> Jun 14 08:50:12 kvm6 kernel: block drbd0: error receiving Data, l: 4140!
> Jun 14 08:50:12 kvm6 kernel: block drbd0: peer( Primary -> Unknown ) conn(
> Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
> Jun 14 08:50:12 kvm6 kernel: block drbd0: new current UUID
> CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D
> Jun 14 08:50:12 kvm6 kernel: block drbd0: asender terminated
> Jun 14 08:50:12 kvm6 kernel: block drbd0: Terminating asender thread
> Jun 14 08:50:12 kvm6 kernel: block drbd0: Connection closed
> Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( ProtocolError -> Unconnected
> )
> Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver terminated
> Jun 14 08:50:12 kvm6 kernel: block drbd0: Restarting receiver thread
> Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver (re)started
> Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( Unconnected -> WFConnection
> )
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Handshake successful: Agreed
> network protocol version 96
> Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Starting asender thread (from
> drbd0_receiver [1857])
> Jun 14 08:50:13 kvm6 kernel: block drbd0: data-integrity-alg: sha1
> Jun 14 08:50:13 kvm6 kernel: block drbd0: drbd_sync_handshake:
> Jun 14 08:50:13 kvm6 kernel: block drbd0: self
> CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0
> flags:0
> Jun 14 08:50:13 kvm6 kernel: block drbd0: peer
> 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99
> flags:0
> Jun 14 08:50:13 kvm6 kernel: block drbd0: uuid_compare()=100 by rule 90
> Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0
> Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0 exit code 0 (0x0)
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Split-Brain detected but
> unresolved, dropping connection!
> Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0
> Jun 14 08:50:13 kvm6 kernel: block drbd0: meta connection shut down by peer.
> Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFReportParams ->
> NetworkFailure )
> Jun 14 08:50:13 kvm6 kernel: block drbd0: asender terminated
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating asender thread
> Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0 exit code 0 (0x0)
> Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( NetworkFailure ->
> Disconnecting )
> Jun 14 08:50:13 kvm6 kernel: block drbd0: error receiving ReportState, l: 4!
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Connection closed
> Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( Disconnecting -> StandAlone
> )
> Jun 14 08:50:13 kvm6 kernel: block drbd0: receiver terminated
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating receiver thread
>
>
>
> I will be extremely grateful to anyone who can help me
>
> Best regards
> Cesar
>
>
>
>
> --
> View this message in context: http://drbd.10923.n7.nabble.com/Replication-problems-constants-with-DRBD-8-3-10-tp17896.html
> Sent from the DRBD - User mailing list archive at Nabble.com.
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



More information about the drbd-user mailing list