[DRBD-user] split brain issues

Thu Jan 22 17:51:52 CET 2015

Hello Richard,

Not sure what caused Inconsistent state on your drbd resource on Node1, but
my guess is that it experienced some kind of low level corruption on it's
backing storage (hard disks) and auto sync was initiated from Node2.

Are you using linux raid? Then probably you don't have a battery backed
raid controller, so it would be wise to remove the following entries since
they could cause data loss in your case:

                no-disk-flushes;
                no-md-flushes;
                no-disk-barrier;

Finally from my experince proxmox does not work well with HA enabled (it's
not aware of underlying drbd resource) so it could cause frequent
split-brains to occur.Use DRBD without HA enabled on Proxmox , in
dual-primary mode (so you don't loose live migration capability on proxmox).

You can also create separate drbd resources for each proxmox mode so you
can better handle split brains.For example drbd0 -> drbdvg0 -> mounted
always on Node1 and drbd1 -> drbdvg1 -> mounted on Node2.
This way you will always know that vms running on Node1 are located on
drbd0 and vms running on Node2 are located on drbd1.

Regards
Yannis

On Sun, Jan 18, 2015 at 11:42 AM, Lechner Richard <r.lechner at gmx.net> wrote:

> Repost from 13ten Jan.
>
> Hello all,
>
> sorry this will be a longer post!
>
> I have some strange issues since a few weeks. Sometimes drbd running into a
> split brain but i not real understand why!
> I run a proxmox cluster with 2 nodes and only one VM is running on the
> first
> node (Node1), so the other node (Node2) is the HA-backupnode to switch the
> VM
> when something happen.
>
> The disc's are md's on both nodes:
>
> Personalities : [raid1]
> md2 : active raid1 sda3[0] sdb3[1]
>       2930129536 blocks super 1.2 [2/2] [UU]
>
>
> Drbd-Config:
> resource r1 {
>         protocol C;
>         startup {
>                 wfc-timeout  0;
>                 degr-wfc-timeout 60;
>                 become-primary-on both;
>         }
>         net {
>         sndbuf-size 10M;
>         rcvbuf-size 10M;
>         ping-int 2;
>         ping-timeout 2;
>         connect-int 2;
>         timeout 5;
>         ko-count 5;
>         max-buffers 128k;
>         max-epoch-size 8192;
>                  cram-hmac-alg sha1;
>                 shared-secret "XXXXXX";
>                 allow-two-primaries;
>                 after-sb-0pri discard-zero-changes;
>                 after-sb-1pri discard-secondary;
>                 after-sb-2pri disconnect;
>         }
>         on node1 {
>                 device /dev/drbd0;
>                 disk /dev/md2;
>                 address 10.1.5.31:7788;
>                 meta-disk internal;
>         }
>         on node2 {
>                 device /dev/drbd0;
>                 disk /dev/md2;
>                 address 10.1.5.32:7788;
>                 meta-disk internal;
>         }
>         disk {
>                 no-disk-flushes;
>                 no-md-flushes;
>                 no-disk-barrier;
>         }
> }
>
>
> The disc for the VM are LV's and only mounted inside the VM, vm-101-disk-1
> is the VM-root-fs and disk-2 is the VM-mailstorage. There is/should no
> mount's
> or access from the nodes directly!
>
>  --- Logical volume ---
>   LV Path                /dev/drbd0vg/vm-101-disk-1
>   LV Name                vm-101-disk-1
>   VG Name                drbd0vg
>   LV Size                75,00 GiB
>
>   --- Logical volume ---
>   LV Path                /dev/drbd0vg/vm-101-disk-2
>   LV Name                vm-101-disk-2
>   VG Name                drbd0vg
>   LV Size                550,00 GiB
>
>
> The nodes don't use/mount any of /dev/drbd0vg/
>
> Dateisystem          Größe Benutzt Verf. Verw% Eingehängt auf
> udev                   10M       0   10M    0% /dev
> tmpfs                 1,6G    504K  1,6G    1% /run
> /dev/mapper/pve-root   78G    3,0G   72G    4% /
> tmpfs                 5,0M    4,0K  5,0M    1% /run/lock
> tmpfs                 3,2G     50M  3,1G    2% /run/shm
> /dev/sdc1             232M     72M  148M   33% /boot
> /dev/fuse              30M     24K   30M    1% /etc/pve
>
> So DRBD runns Primary/Primary but how it can changed something on the
> second node if nothing is runnig there and the LV's are not mounted? It
> should
> not exist new data on the drbd-volume on Node2. But last time i did a
> resync
> (after i stoped the VM on Node1!) it sync's 15 GB from node2 to node1!
> Unbelievable!!
> I take a screenshot but i'm not sure i can attach it here?
>
> DRBD-Status on Node1 was:
> Primary/Primary ds:Inconsistent/UpToDate
> So i think the left is Node1 and the right is Node2? How Node2 can be
> UpToDate? I don't understand this because Node2 was running nothing with
> access to the LV's!
> Had some Filesystemerrors inside the VM when she was starting after the
> sync
> on Node1. :-(
>
>
> Before it was a crosslink cable and i want to make sure there is no
> problem,
> so sunday i installed a switch and normal cables only for the drbd-network!
> (But when the break was happen i not see a eth1 down or something like
> that)
>
> Jan 12 10:49:34 node1 kernel: block drbd0: Remote failed to finish a
> request
> within ko-count * timeout
> Jan 12 10:49:34 node1 kernel: block drbd0: peer( Primary -> Unknown ) conn(
> Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
> Jan 12 10:49:34 node1 kernel: block drbd0: asender terminated
> Jan 12 10:49:34 node1 kernel: block drbd0: Terminating asender thread
> Jan 12 10:49:34 node1 kernel: block drbd0: new current UUID
> D4335C79AD0E0BC3:AE406068788B0F3B:95D06B8F4DD0CE03:95CF6B8F4DD0CE03
> Jan 12 10:49:35 node1 kernel: block drbd0: Connection closed
> Jan 12 10:49:35 node1 kernel: block drbd0: conn( Timeout -> Unconnected )
> Jan 12 10:49:35 node1 kernel: block drbd0: receiver terminated
> Jan 12 10:49:35 node1 kernel: block drbd0: Restarting receiver thread
> Jan 12 10:49:35 node1 kernel: block drbd0: receiver (re)started
> Jan 12 10:49:35 node1 kernel: block drbd0: conn( Unconnected ->
> WFConnection )
> Jan 12 10:49:35 node1 kernel: block drbd0: Handshake successful: Agreed
> network protocol version 96
> Jan 12 10:49:35 node1 kernel: block drbd0: Peer authenticated using 20
> bytes
> of 'sha1' HMAC
> Jan 12 10:49:35 node1 kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
> Jan 12 10:49:35 node1 kernel: block drbd0: Starting asender thread (from
> drbd0_receiver [2840])
> Jan 12 10:49:35 node1 kernel: block drbd0: data-integrity-alg: <not-used>
> Jan 12 10:49:35 node1 kernel: block drbd0: drbd_sync_handshake:
> Jan 12 10:49:35 node1 kernel: block drbd0: self
> D4335C79AD0E0BC3:AE406068788B0F3B:95D06B8F4DD0CE03:95CF6B8F4DD0CE03
> bits:42099
> flags:0
> Jan 12 10:49:35 node1 kernel: block drbd0: peer
> A5FBD7AF4A9FD583:AE406068788B0F3B:95D06B8F4DD0CE03:95CF6B8F4DD0CE03 bits:0
> flags:0
> Jan 12 10:49:35 node1 kernel: block drbd0: uuid_compare()=100 by rule 90
> Jan 12 10:49:35 node1 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0
> Jan 12 10:49:35 node1 kernel: block drbd0: meta connection shut down by
> peer.
> Jan 12 10:49:35 node1 kernel: block drbd0: conn( WFReportParams ->
> NetworkFailure )
> Jan 12 10:49:35 node1 kernel: block drbd0: asender terminated
> Jan 12 10:49:35 node1 kernel: block drbd0: Terminating asender thread
> Jan 12 10:49:35 node1 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0 exit code 0 (0x0)
> Jan 12 10:49:35 node1 kernel: block drbd0: Split-Brain detected but
> unresolved, dropping connection!
> Jan 12 10:49:35 node1 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0
> Jan 12 10:49:35 node1 kernel: block drbd0: helper command: /sbin/drbdadm
> split-brain minor-0 exit code 0 (0x0)
> Jan 12 10:49:35 node1 kernel: block drbd0: conn( NetworkFailure ->
> Disconnecting )
> Jan 12 10:49:35 node1 kernel: block drbd0: error receiving ReportState, l:
> 4!
> Jan 12 10:49:35 node1 kernel: block drbd0: Connection closed
> Jan 12 10:49:35 node1 kernel: block drbd0: conn( Disconnecting ->
> StandAlone )
> Jan 12 10:49:35 node1 kernel: block drbd0: receiver terminated
> Jan 12 10:49:35 node1 kernel: block drbd0: Terminating receiver thread
>
>
> grep eth1 /var/log/kern.log
> Jan 11 15:10:42 node1 kernel: igb 0000:07:00.1: eth1: (PCIe:5.0GT/s:Width
> x2)
> Jan 11 15:10:42 node1 kernel: igb 0000:07:00.1: eth1: MAC:
> f8:0f:41:fb:32:21
> Jan 11 15:10:42 node1 kernel: igb 0000:07:00.1: eth1: PBA No: 106300-000
> Jan 11 15:10:42 node1 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
> Jan 11 15:10:42 node1 kernel: igb 0000:07:00.1: eth1: igb: eth1 NIC Link
> is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Jan 11 15:10:42 node1 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes
> ready
> Jan 11 15:10:52 node1 kernel: eth1: no IPv6 routers present
>
>
> eth1 is still up, no errors, ping and ssh still working on that interface!
>
> eth1      Link encap:Ethernet  Hardware Adresse f8:0f:41:fb:32:21
>           inet Adresse:10.1.5.31  Bcast:10.1.5.255  Maske:255.255.255.0
>           inet6-Adresse: fe80::fa0f:41ff:fefb:3221/64
> Gültigkeitsbereich:Verbindung
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metrik:1
>           RX packets:13129653 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:13383845 errors:0 dropped:0 overruns:0 carrier:0
>           Kollisionen:0 Sendewarteschlangenlänge:1000
>           RX bytes:17271836211 (16.0 GiB)  TX bytes:15506649645 (14.4 GiB)
>
> uname -a
> Linux node1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64
> GNU/Linux
>
>
> On Node2 i don't find any disc error or something like this!
> So what can be the problem and how can i fix this? I read the docu and if i
> understand correctly a automatic split brain repair is not useable for my
> situation because i don't know where the VM is running last time.
>
> Try to attach the screen also.
> Any hints?
>
> Regards
>
> Richard
>
> PS: I get Eric's post where he mention: "The split brain would only happen
> on
> dual primary. "
> So i changed to Primary/Secondary and stoped the HA in Proxmox.
> Last few days no errors occur but i have to observe this the next weeks.
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150122/8e7da167/attachment.htm>