Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
First thing that jumps out at me is that the round-robin bonding is not supported. Only mode=1 (Active/Passive) is. Secondly, you do not have fencing, so when the network error occurred, you got a split brain; > Jun 14 08:50:13 kvm5 kernel: block drbd0: Split-Brain detected but > unresolved, dropping connection! So, switch your bonding to mode=1 and then follow the instructions to resolve a split-brain. http://www.drbd.org/users-guide-8.3/s-resolve-split-brain.html Once this is sorted out, configure and use actual fencing (stonith in pacemaker terms). digimer On 06/15/2013 04:29 AM, cesar wrote: > Hello everyone > > *Please Urgent, my servers are in production* > > I am in a serious problem and need help > > *My my scenario* > - I have two workstations ASUS P8H77-M PRO with Intel core I7, Proxmox VE > 2.3, DRBD 8.3.10, LVM on top of DRBD > - 2 NICs Realtek RTL8111/8168 PCI-E of 1 Gb/s in bond round robin only for > use with DRBD > > And after awhile it shows me this: > > shell#cat /proc/drbd > version: 8.3.13 (api:88/proto:86-96) > GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root at sighted, > 2012-10-09 12:47:51 > 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- > ns:237256 nr:307093 dw:307093 dr:690264 al:0 bm:321 lo:0 pe:0 ua:0 ap:0 > ep:1 wo:b oos:0 > 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- > ns:0 nr:467984 dw:467984 dr:537932 al:0 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1 > wo:b oos:0 > > *This is my configuration:* > > File global_common.conf: > global { usage-count no; > } > > common { > protocol C; > > handlers { > pri-on-incon-degr > "/usr/lib/drbd/notify-pri-on-incon-degr.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot -f"; > pri-lost-after-sb > "/usr/lib/drbd/notify-pri-lost-after-sb.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot -f"; > local-io-error "/usr/lib/drbd/notify-io-error.sh; > /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; > halt -f"; > split-brain "/usr/lib/drbd/notify-split-brain.sh root"; > out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; > } > > startup { > } > > disk { on-io-error detach; > } > > net { sndbuf-size 0; no-tcp-cork; unplug-watermark 16; max-buffers > 8000; max-epoch-size 8000; > data-integrity-alg sha1; > } > > syncer { rate 75M; al-extents 3389; cpu-mask 0; verify-alg "sha1"; > } > } > > *File r0.res:* > resource r0 { > protocol C; > startup { > wfc-timeout 15; > degr-wfc-timeout 60; > become-primary-on both; > } > net { > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > } > on kvm5 { > device /dev/drbd0; > disk /dev/sda3; > address 10.2.2.50:7788; > meta-disk internal; > } > on kvm6 { > device /dev/drbd0; > disk /dev/sda3; > address 10.2.2.51:7788; > meta-disk internal; > } > } > > *File r1.res:* > resource r1 { > protocol C; > startup { > wfc-timeout 15; > degr-wfc-timeout 60; > become-primary-on both; > } > net { > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > } > on kvm5 { > device /dev/drbd1; > disk /dev/sdb3; > address 10.2.2.50:7789; > meta-disk internal; > } > on kvm6 { > device /dev/drbd1; > disk /dev/sdb3; > address 10.2.2.51:7789; > meta-disk internal; > } > } > > *Note:* > I use on the directive net "data-integrity-alg sha1"; because for me is very > important the data > > *This is my logs:* > > *Log in Node A:* > Jun 14 08:07:28 kvm5 kernel: dlm: connecting to 4 > Jun 14 08:50:12 kvm5 kernel: block drbd0: Digest mismatch, buffer modified > by upper layers during write: 21158352s +4096 > Jun 14 08:50:12 kvm5 kernel: block drbd0: sock was reset by peer > Jun 14 08:50:12 kvm5 kernel: block drbd0: peer( Primary -> Unknown ) conn( > Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) > Jun 14 08:50:12 kvm5 kernel: block drbd0: short read expecting header on > sock: r=-104 > Jun 14 08:50:12 kvm5 kernel: block drbd0: meta connection shut down by peer. > Jun 14 08:50:12 kvm5 kernel: block drbd0: new current UUID > 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D > Jun 14 08:50:12 kvm5 kernel: block drbd0: asender terminated > Jun 14 08:50:12 kvm5 kernel: block drbd0: Terminating asender thread > Jun 14 08:50:12 kvm5 kernel: block drbd0: Connection closed > Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( BrokenPipe -> Unconnected ) > Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver terminated > Jun 14 08:50:12 kvm5 kernel: block drbd0: Restarting receiver thread > Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver (re)started > Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( Unconnected -> WFConnection > ) > Jun 14 08:50:13 kvm5 kernel: block drbd0: Handshake successful: Agreed > network protocol version 96 > Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFConnection -> > WFReportParams ) > Jun 14 08:50:13 kvm5 kernel: block drbd0: Starting asender thread (from > drbd0_receiver [1847]) > Jun 14 08:50:13 kvm5 kernel: block drbd0: data-integrity-alg: sha1 > Jun 14 08:50:13 kvm5 kernel: block drbd0: drbd_sync_handshake: > Jun 14 08:50:13 kvm5 kernel: block drbd0: self > 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99 > flags:0 > Jun 14 08:50:13 kvm5 kernel: block drbd0: peer > CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0 > flags:0 > Jun 14 08:50:13 kvm5 kernel: block drbd0: uuid_compare()=100 by rule 90 > Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm > initial-split-brain minor-0 > Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm > initial-split-brain minor-0 exit code 0 (0x0) > Jun 14 08:50:13 kvm5 kernel: block drbd0: Split-Brain detected but > unresolved, dropping connection! > Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm > split-brain minor-0 > Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm > split-brain minor-0 exit code 0 (0x0) > Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFReportParams -> > Disconnecting ) > Jun 14 08:50:13 kvm5 kernel: block drbd0: error receiving ReportState, l: 4! > Jun 14 08:50:13 kvm5 kernel: block drbd0: asender terminated > Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating asender thread > Jun 14 08:50:13 kvm5 kernel: block drbd0: Connection closed > Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( Disconnecting -> StandAlone > ) > Jun 14 08:50:13 kvm5 kernel: block drbd0: receiver terminated > Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating receiver thread > > *Log in node B:* > Jun 14 08:07:28 kvm6 kernel: dlm: Using TCP for communications > Jun 14 08:07:28 kvm6 kernel: dlm: got connection from 3 > Jun 14 08:50:12 kvm6 kernel: block drbd0: Digest integrity check FAILED: > 21158352s +4096 > Jun 14 08:50:12 kvm6 kernel: block drbd0: error receiving Data, l: 4140! > Jun 14 08:50:12 kvm6 kernel: block drbd0: peer( Primary -> Unknown ) conn( > Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) > Jun 14 08:50:12 kvm6 kernel: block drbd0: new current UUID > CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D > Jun 14 08:50:12 kvm6 kernel: block drbd0: asender terminated > Jun 14 08:50:12 kvm6 kernel: block drbd0: Terminating asender thread > Jun 14 08:50:12 kvm6 kernel: block drbd0: Connection closed > Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( ProtocolError -> Unconnected > ) > Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver terminated > Jun 14 08:50:12 kvm6 kernel: block drbd0: Restarting receiver thread > Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver (re)started > Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( Unconnected -> WFConnection > ) > Jun 14 08:50:13 kvm6 kernel: block drbd0: Handshake successful: Agreed > network protocol version 96 > Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFConnection -> > WFReportParams ) > Jun 14 08:50:13 kvm6 kernel: block drbd0: Starting asender thread (from > drbd0_receiver [1857]) > Jun 14 08:50:13 kvm6 kernel: block drbd0: data-integrity-alg: sha1 > Jun 14 08:50:13 kvm6 kernel: block drbd0: drbd_sync_handshake: > Jun 14 08:50:13 kvm6 kernel: block drbd0: self > CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0 > flags:0 > Jun 14 08:50:13 kvm6 kernel: block drbd0: peer > 76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99 > flags:0 > Jun 14 08:50:13 kvm6 kernel: block drbd0: uuid_compare()=100 by rule 90 > Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm > initial-split-brain minor-0 > Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm > initial-split-brain minor-0 exit code 0 (0x0) > Jun 14 08:50:13 kvm6 kernel: block drbd0: Split-Brain detected but > unresolved, dropping connection! > Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm > split-brain minor-0 > Jun 14 08:50:13 kvm6 kernel: block drbd0: meta connection shut down by peer. > Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFReportParams -> > NetworkFailure ) > Jun 14 08:50:13 kvm6 kernel: block drbd0: asender terminated > Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating asender thread > Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm > split-brain minor-0 exit code 0 (0x0) > Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( NetworkFailure -> > Disconnecting ) > Jun 14 08:50:13 kvm6 kernel: block drbd0: error receiving ReportState, l: 4! > Jun 14 08:50:13 kvm6 kernel: block drbd0: Connection closed > Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( Disconnecting -> StandAlone > ) > Jun 14 08:50:13 kvm6 kernel: block drbd0: receiver terminated > Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating receiver thread > > > > I will be extremely grateful to anyone who can help me > > Best regards > Cesar > > > > > -- > View this message in context: http://drbd.10923.n7.nabble.com/Replication-problems-constants-with-DRBD-8-3-10-tp17896.html > Sent from the DRBD - User mailing list archive at Nabble.com. > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?