Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sun, Jan 18, 2015 at 10:42:22AM +0100, Lechner Richard wrote: > Repost from 13ten Jan. > > Hello all, > > sorry this will be a longer post! > > I have some strange issues since a few weeks. Sometimes drbd running into a > split brain but i not real understand why! > I run a proxmox cluster with 2 nodes and only one VM is running on the first > node (Node1), so the other node (Node2) is the HA-backupnode to switch the VM > when something happen. > > The disc's are md's on both nodes: > > Personalities : [raid1] > md2 : active raid1 sda3[0] sdb3[1] > 2930129536 blocks super 1.2 [2/2] [UU] > > > Drbd-Config: > resource r1 { > protocol C; > startup { > wfc-timeout 0; > degr-wfc-timeout 60; > become-primary-on both; > } > net { > sndbuf-size 10M; > rcvbuf-size 10M; > ping-int 2; > ping-timeout 2; > connect-int 2; > timeout 5; > ko-count 5; unit of timeout: 0.1 seconds. 2.5 seconds total timeout. Peak latencies of that are not unheard of. Can happen on a busy system, on a busy network, on a busy IO substem, or even just on a lazy day... That's simply too aggressively configured. > Jan 12 10:49:34 node1 kernel: block drbd0: Remote failed to finish a request > within ko-count * timeout Timeout. "Surprise" > PS: I get Eric's post where he mention: "The split brain would only happen on > dual primary. " > So i changed to Primary/Secondary and stoped the HA in Proxmox. Most "HA" in "Proxmox" I came accross over the years is very much misconfigured and works only by accident in good weather conditions. But being generous with timeouts would help already. -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed