[DRBD-user] split brain issues

Sun Jan 18 10:42:22 CET 2015

Repost from 13ten Jan.

Hello all,

sorry this will be a longer post!

I have some strange issues since a few weeks. Sometimes drbd running into a 
split brain but i not real understand why!
I run a proxmox cluster with 2 nodes and only one VM is running on the first 
node (Node1), so the other node (Node2) is the HA-backupnode to switch the VM 
when something happen.

The disc's are md's on both nodes:

Personalities : [raid1] 
md2 : active raid1 sda3[0] sdb3[1]
      2930129536 blocks super 1.2 [2/2] [UU]

Drbd-Config:
resource r1 {
        protocol C;
        startup {
                wfc-timeout  0; 
                degr-wfc-timeout 60;
                become-primary-on both;
        }
        net {
        sndbuf-size 10M;
        rcvbuf-size 10M;
        ping-int 2;
        ping-timeout 2;
        connect-int 2;
        timeout 5;
        ko-count 5;
        max-buffers 128k;
        max-epoch-size 8192;
                 cram-hmac-alg sha1;
                shared-secret "XXXXXX";
                allow-two-primaries;
                after-sb-0pri discard-zero-changes;
                after-sb-1pri discard-secondary;
                after-sb-2pri disconnect;
        }
        on node1 {
                device /dev/drbd0;
                disk /dev/md2;
                address 10.1.5.31:7788;
                meta-disk internal;
        }
        on node2 {
                device /dev/drbd0;
                disk /dev/md2;
                address 10.1.5.32:7788;
                meta-disk internal;
        }
        disk {
                no-disk-flushes;
                no-md-flushes;
                no-disk-barrier;
        }
}

The disc for the VM are LV's and only mounted inside the VM, vm-101-disk-1 
is the VM-root-fs and disk-2 is the VM-mailstorage. There is/should no mount's 
or access from the nodes directly!

 --- Logical volume ---
  LV Path                /dev/drbd0vg/vm-101-disk-1
  LV Name                vm-101-disk-1
  VG Name                drbd0vg
  LV Size                75,00 GiB

  --- Logical volume ---
  LV Path                /dev/drbd0vg/vm-101-disk-2
  LV Name                vm-101-disk-2
  VG Name                drbd0vg
  LV Size                550,00 GiB

The nodes don't use/mount any of /dev/drbd0vg/

Dateisystem          Größe Benutzt Verf. Verw% Eingehängt auf
udev                   10M       0   10M    0% /dev
tmpfs                 1,6G    504K  1,6G    1% /run
/dev/mapper/pve-root   78G    3,0G   72G    4% /
tmpfs                 5,0M    4,0K  5,0M    1% /run/lock
tmpfs                 3,2G     50M  3,1G    2% /run/shm
/dev/sdc1             232M     72M  148M   33% /boot
/dev/fuse              30M     24K   30M    1% /etc/pve

So DRBD runns Primary/Primary but how it can changed something on the 
second node if nothing is runnig there and the LV's are not mounted? It should 
not exist new data on the drbd-volume on Node2. But last time i did a resync 
(after i stoped the VM on Node1!) it sync's 15 GB from node2 to node1! 
Unbelievable!!
I take a screenshot but i'm not sure i can attach it here?

DRBD-Status on Node1 was:
Primary/Primary ds:Inconsistent/UpToDate
So i think the left is Node1 and the right is Node2? How Node2 can be 
UpToDate? I don't understand this because Node2 was running nothing with 
access to the LV's!
Had some Filesystemerrors inside the VM when she was starting after the sync 
on Node1. :-(

Before it was a crosslink cable and i want to make sure there is no problem, 
so sunday i installed a switch and normal cables only for the drbd-network! 
(But when the break was happen i not see a eth1 down or something like that)

Jan 12 10:49:34 node1 kernel: block drbd0: Remote failed to finish a request 
within ko-count * timeout
Jan 12 10:49:34 node1 kernel: block drbd0: peer( Primary -> Unknown ) conn( 
Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) 
Jan 12 10:49:34 node1 kernel: block drbd0: asender terminated
Jan 12 10:49:34 node1 kernel: block drbd0: Terminating asender thread
Jan 12 10:49:34 node1 kernel: block drbd0: new current UUID 
D4335C79AD0E0BC3:AE406068788B0F3B:95D06B8F4DD0CE03:95CF6B8F4DD0CE03
Jan 12 10:49:35 node1 kernel: block drbd0: Connection closed
Jan 12 10:49:35 node1 kernel: block drbd0: conn( Timeout -> Unconnected ) 
Jan 12 10:49:35 node1 kernel: block drbd0: receiver terminated
Jan 12 10:49:35 node1 kernel: block drbd0: Restarting receiver thread
Jan 12 10:49:35 node1 kernel: block drbd0: receiver (re)started
Jan 12 10:49:35 node1 kernel: block drbd0: conn( Unconnected -> WFConnection ) 
Jan 12 10:49:35 node1 kernel: block drbd0: Handshake successful: Agreed 
network protocol version 96
Jan 12 10:49:35 node1 kernel: block drbd0: Peer authenticated using 20 bytes 
of 'sha1' HMAC
Jan 12 10:49:35 node1 kernel: block drbd0: conn( WFConnection -> 
WFReportParams ) 
Jan 12 10:49:35 node1 kernel: block drbd0: Starting asender thread (from 
drbd0_receiver [2840])
Jan 12 10:49:35 node1 kernel: block drbd0: data-integrity-alg: <not-used>
Jan 12 10:49:35 node1 kernel: block drbd0: drbd_sync_handshake:
Jan 12 10:49:35 node1 kernel: block drbd0: self 
D4335C79AD0E0BC3:AE406068788B0F3B:95D06B8F4DD0CE03:95CF6B8F4DD0CE03 bits:42099 
flags:0
Jan 12 10:49:35 node1 kernel: block drbd0: peer 
A5FBD7AF4A9FD583:AE406068788B0F3B:95D06B8F4DD0CE03:95CF6B8F4DD0CE03 bits:0 
flags:0
Jan 12 10:49:35 node1 kernel: block drbd0: uuid_compare()=100 by rule 90
Jan 12 10:49:35 node1 kernel: block drbd0: helper command: /sbin/drbdadm 
initial-split-brain minor-0
Jan 12 10:49:35 node1 kernel: block drbd0: meta connection shut down by peer.
Jan 12 10:49:35 node1 kernel: block drbd0: conn( WFReportParams -> 
NetworkFailure ) 
Jan 12 10:49:35 node1 kernel: block drbd0: asender terminated
Jan 12 10:49:35 node1 kernel: block drbd0: Terminating asender thread
Jan 12 10:49:35 node1 kernel: block drbd0: helper command: /sbin/drbdadm 
initial-split-brain minor-0 exit code 0 (0x0)
Jan 12 10:49:35 node1 kernel: block drbd0: Split-Brain detected but 
unresolved, dropping connection!
Jan 12 10:49:35 node1 kernel: block drbd0: helper command: /sbin/drbdadm 
split-brain minor-0
Jan 12 10:49:35 node1 kernel: block drbd0: helper command: /sbin/drbdadm 
split-brain minor-0 exit code 0 (0x0)
Jan 12 10:49:35 node1 kernel: block drbd0: conn( NetworkFailure -> 
Disconnecting ) 
Jan 12 10:49:35 node1 kernel: block drbd0: error receiving ReportState, l: 4!
Jan 12 10:49:35 node1 kernel: block drbd0: Connection closed
Jan 12 10:49:35 node1 kernel: block drbd0: conn( Disconnecting -> StandAlone ) 
Jan 12 10:49:35 node1 kernel: block drbd0: receiver terminated
Jan 12 10:49:35 node1 kernel: block drbd0: Terminating receiver thread

grep eth1 /var/log/kern.log
Jan 11 15:10:42 node1 kernel: igb 0000:07:00.1: eth1: (PCIe:5.0GT/s:Width x2) 
Jan 11 15:10:42 node1 kernel: igb 0000:07:00.1: eth1: MAC: f8:0f:41:fb:32:21
Jan 11 15:10:42 node1 kernel: igb 0000:07:00.1: eth1: PBA No: 106300-000
Jan 11 15:10:42 node1 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Jan 11 15:10:42 node1 kernel: igb 0000:07:00.1: eth1: igb: eth1 NIC Link is Up 
1000 Mbps Full Duplex, Flow Control: RX/TX
Jan 11 15:10:42 node1 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes 
ready
Jan 11 15:10:52 node1 kernel: eth1: no IPv6 routers present

eth1 is still up, no errors, ping and ssh still working on that interface!

eth1      Link encap:Ethernet  Hardware Adresse f8:0f:41:fb:32:21  
          inet Adresse:10.1.5.31  Bcast:10.1.5.255  Maske:255.255.255.0
          inet6-Adresse: fe80::fa0f:41ff:fefb:3221/64 
Gültigkeitsbereich:Verbindung
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metrik:1
          RX packets:13129653 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13383845 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:1000 
          RX bytes:17271836211 (16.0 GiB)  TX bytes:15506649645 (14.4 GiB)

uname -a
Linux node1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux

On Node2 i don't find any disc error or something like this!
So what can be the problem and how can i fix this? I read the docu and if i 
understand correctly a automatic split brain repair is not useable for my 
situation because i don't know where the VM is running last time.

Try to attach the screen also.
Any hints?

Regards

Richard

PS: I get Eric's post where he mention: "The split brain would only happen on 
dual primary. "
So i changed to Primary/Secondary and stoped the HA in Proxmox.
Last few days no errors occur but i have to observe this the next weeks.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: node1.png
Type: image/png
Size: 142332 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150118/f92ba091/attachment.png>