Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sun, Jan 18, 2015 at 10:42:22AM +0100, Lechner Richard wrote:
> Repost from 13ten Jan.
>
> Hello all,
>
> sorry this will be a longer post!
>
> I have some strange issues since a few weeks. Sometimes drbd running into a
> split brain but i not real understand why!
> I run a proxmox cluster with 2 nodes and only one VM is running on the first
> node (Node1), so the other node (Node2) is the HA-backupnode to switch the VM
> when something happen.
>
> The disc's are md's on both nodes:
>
> Personalities : [raid1]
> md2 : active raid1 sda3[0] sdb3[1]
> 2930129536 blocks super 1.2 [2/2] [UU]
>
>
> Drbd-Config:
> resource r1 {
> protocol C;
> startup {
> wfc-timeout 0;
> degr-wfc-timeout 60;
> become-primary-on both;
> }
> net {
> sndbuf-size 10M;
> rcvbuf-size 10M;
> ping-int 2;
> ping-timeout 2;
> connect-int 2;
> timeout 5;
> ko-count 5;
unit of timeout: 0.1 seconds.
2.5 seconds total timeout.
Peak latencies of that are not unheard of. Can happen on a busy system,
on a busy network, on a busy IO substem, or even just on a lazy day...
That's simply too aggressively configured.
> Jan 12 10:49:34 node1 kernel: block drbd0: Remote failed to finish a request
> within ko-count * timeout
Timeout.
"Surprise"
> PS: I get Eric's post where he mention: "The split brain would only happen on
> dual primary. "
> So i changed to Primary/Secondary and stoped the HA in Proxmox.
Most "HA" in "Proxmox" I came accross over the years is very much
misconfigured and works only by accident in good weather conditions.
But being generous with timeouts would help already.
--
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA and Pacemaker support and consulting
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed