Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
First thing that jumps out at me is that the round-robin bonding is not
supported. Only mode=1 (Active/Passive) is. Secondly, you do not have
fencing, so when the network error occurred, you got a split brain;
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Split-Brain detected but
> unresolved, dropping connection!
So, switch your bonding to mode=1 and then follow the instructions to
resolve a split-brain.
http://www.drbd.org/users-guide-8.3/s-resolve-split-brain.html
Once this is sorted out, configure and use actual fencing (stonith in
pacemaker terms).
digimer
On 06/15/2013 04:29 AM, cesar wrote:
> Hello everyone
>
> *Please Urgent, my servers are in production*
>
> I am in a serious problem and need help
>
> *My my scenario*
> - I have two workstations ASUS P8H77-M PRO with Intel core I7, Proxmox VE
> 2.3, DRBD 8.3.10, LVM on top of DRBD
> - 2 NICs Realtek RTL8111/8168 PCI-E of 1 Gb/s in bond round robin only for
> use with DRBD
>
> And after awhile it shows me this:
>
> shell#cat /proc/drbd
> version: 8.3.13 (api:88/proto:86-96)
> GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root at sighted,
> 2012-10-09 12:47:51
> 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r-----
> ns:237256 nr:307093 dw:307093 dr:690264 al:0 bm:321 lo:0 pe:0 ua:0 ap:0
> ep:1 wo:b oos:0
> 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
> ns:0 nr:467984 dw:467984 dr:537932 al:0 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1
> wo:b oos:0
>
> *This is my configuration:*
>
> File global_common.conf:
> global { usage-count no;
> }
>
> common {
> protocol C;
>
> handlers {
> pri-on-incon-degr
> "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
> pri-lost-after-sb
> "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
> local-io-error "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
> halt -f";
> split-brain "/usr/lib/drbd/notify-split-brain.sh root";
> out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
> }
>
> startup {
> }
>
> disk { on-io-error detach;
> }
>
> net { sndbuf-size 0; no-tcp-cork; unplug-watermark 16; max-buffers
> 8000; max-epoch-size 8000;
> data-integrity-alg sha1;
> }
>
> syncer { rate 75M; al-extents 3389; cpu-mask 0; verify-alg "sha1";
> }
> }
>
> *File r0.res:*
> resource r0 {
> protocol C;
> startup {
> wfc-timeout 15;
> degr-wfc-timeout 60;
> become-primary-on both;
> }
> net {
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> }
> on kvm5 {
> device /dev/drbd0;
> disk /dev/sda3;
> address 10.2.2.50:7788;
> meta-disk internal;
> }
> on kvm6 {
> device /dev/drbd0;
> disk /dev/sda3;
> address 10.2.2.51:7788;
> meta-disk internal;
> }
> }
>
> *File r1.res:*
> resource r1 {
> protocol C;
> startup {
> wfc-timeout 15;
> degr-wfc-timeout 60;
> become-primary-on both;
> }
> net {
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> }
> on kvm5 {
> device /dev/drbd1;
> disk /dev/sdb3;
> address 10.2.2.50:7789;
> meta-disk internal;
> }
> on kvm6 {
> device /dev/drbd1;
> disk /dev/sdb3;
> address 10.2.2.51:7789;
> meta-disk internal;
> }
> }
>
> *Note:*
> I use on the directive net "data-integrity-alg sha1"; because for me is very
> important the data
>
> *This is my logs:*
>
> *Log in Node A:*
> Jun 14 08:07:28 kvm5 kernel: dlm: connecting to 4
> Jun 14 08:50:12 kvm5 kernel: block drbd0: Digest mismatch, buffer modified
> by upper layers during write: 21158352s +4096
> Jun 14 08:50:12 kvm5 kernel: block drbd0: sock was reset by peer
> Jun 14 08:50:12 kvm5 kernel: block drbd0: peer( Primary -> Unknown ) conn(
> Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
> Jun 14 08:50:12 kvm5 kernel: block drbd0: short read expecting header on
> sock: r=-104
> Jun 14 08:50:12 kvm5 kernel: block drbd0: meta connection shut down by peer.
> Jun 14 08:50:12 kvm5 kernel: block drbd0: new current UUID
> 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D
> Jun 14 08:50:12 kvm5 kernel: block drbd0: asender terminated
> Jun 14 08:50:12 kvm5 kernel: block drbd0: Terminating asender thread
> Jun 14 08:50:12 kvm5 kernel: block drbd0: Connection closed
> Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( BrokenPipe -> Unconnected )
> Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver terminated
> Jun 14 08:50:12 kvm5 kernel: block drbd0: Restarting receiver thread
> Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver (re)started
> Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( Unconnected -> WFConnection
> )
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Handshake successful: Agreed
> network protocol version 96
> Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Starting asender thread (from
> drbd0_receiver [1847])
> Jun 14 08:50:13 kvm5 kernel: block drbd0: data-integrity-alg: sha1
> Jun 14 08:50:13 kvm5 kernel: block drbd0: drbd_sync_handshake:
> Jun 14 08:50:13 kvm5 kernel: block drbd0: self
> 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99
> flags:0
> Jun 14 08:50:13 kvm5 kernel: block drbd0: peer
> CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0
> flags:0
> Jun 14 08:50:13 kvm5 kernel: block drbd0: uuid_compare()=100 by rule 90
> Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0
> Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0 exit code 0 (0x0)
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Split-Brain detected but
> unresolved, dropping connection!
> Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0
> Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0 exit code 0 (0x0)
> Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFReportParams ->
> Disconnecting )
> Jun 14 08:50:13 kvm5 kernel: block drbd0: error receiving ReportState, l: 4!
> Jun 14 08:50:13 kvm5 kernel: block drbd0: asender terminated
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating asender thread
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Connection closed
> Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( Disconnecting -> StandAlone
> )
> Jun 14 08:50:13 kvm5 kernel: block drbd0: receiver terminated
> Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating receiver thread
>
> *Log in node B:*
> Jun 14 08:07:28 kvm6 kernel: dlm: Using TCP for communications
> Jun 14 08:07:28 kvm6 kernel: dlm: got connection from 3
> Jun 14 08:50:12 kvm6 kernel: block drbd0: Digest integrity check FAILED:
> 21158352s +4096
> Jun 14 08:50:12 kvm6 kernel: block drbd0: error receiving Data, l: 4140!
> Jun 14 08:50:12 kvm6 kernel: block drbd0: peer( Primary -> Unknown ) conn(
> Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
> Jun 14 08:50:12 kvm6 kernel: block drbd0: new current UUID
> CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D
> Jun 14 08:50:12 kvm6 kernel: block drbd0: asender terminated
> Jun 14 08:50:12 kvm6 kernel: block drbd0: Terminating asender thread
> Jun 14 08:50:12 kvm6 kernel: block drbd0: Connection closed
> Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( ProtocolError -> Unconnected
> )
> Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver terminated
> Jun 14 08:50:12 kvm6 kernel: block drbd0: Restarting receiver thread
> Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver (re)started
> Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( Unconnected -> WFConnection
> )
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Handshake successful: Agreed
> network protocol version 96
> Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Starting asender thread (from
> drbd0_receiver [1857])
> Jun 14 08:50:13 kvm6 kernel: block drbd0: data-integrity-alg: sha1
> Jun 14 08:50:13 kvm6 kernel: block drbd0: drbd_sync_handshake:
> Jun 14 08:50:13 kvm6 kernel: block drbd0: self
> CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0
> flags:0
> Jun 14 08:50:13 kvm6 kernel: block drbd0: peer
> 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99
> flags:0
> Jun 14 08:50:13 kvm6 kernel: block drbd0: uuid_compare()=100 by rule 90
> Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0
> Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0 exit code 0 (0x0)
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Split-Brain detected but
> unresolved, dropping connection!
> Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0
> Jun 14 08:50:13 kvm6 kernel: block drbd0: meta connection shut down by peer.
> Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFReportParams ->
> NetworkFailure )
> Jun 14 08:50:13 kvm6 kernel: block drbd0: asender terminated
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating asender thread
> Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0 exit code 0 (0x0)
> Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( NetworkFailure ->
> Disconnecting )
> Jun 14 08:50:13 kvm6 kernel: block drbd0: error receiving ReportState, l: 4!
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Connection closed
> Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( Disconnecting -> StandAlone
> )
> Jun 14 08:50:13 kvm6 kernel: block drbd0: receiver terminated
> Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating receiver thread
>
>
>
> I will be extremely grateful to anyone who can help me
>
> Best regards
> Cesar
>
>
>
>
> --
> View this message in context: http://drbd.10923.n7.nabble.com/Replication-problems-constants-with-DRBD-8-3-10-tp17896.html
> Sent from the DRBD - User mailing list archive at Nabble.com.
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?