[DRBD-user] Primary/Unkown, Secondary/Unknown state normal behaviour?

Mon Jun 16 22:07:15 CEST 2008

Doro,

> Hello,
>
> We have set up our first HA 2-node cluster and are currently running
> soak tests (private DRBD replication link, private heartbeat link and
> 2nd heartbeat via DRBD replication link, no dopd, active/passive). We
> have experienced
> several times that after a fail-over test, e.g. unplugging the DRBD
> replication link,

Unplugging the DRBD replication link is not a failover test. What failover
tests _did_ you run?

> DRBD is in a state of Primary/Unknown, Secondary/Unknown and
> will not synchorize. We have to issue several drbdadm commands
> like
> ha1# drbdadm secondary drbd1
> ha2# drbdadm detach drbd1
> ha2# drbdadm -- --discard-my-data connect drbd1
>
> which sometimes got it back. Our question is whether this is normal
> behaviour of DRBD
> to end up in such a state in the first place

No, it is a consequence of misconfigured cluster communication link and/or
missing outdate-peer functionality. See
http://www.drbd.org/users-guide/s-outdate.html for some background on
this.

> and what is the recommended recovery?

See http://www.drbd.org/users-guide/s-resolve-split-brain.html

> Is there any timeout to tune to improve this?

At this point you don't need to look into timeouts. It may be wise to take
a peek at http://www.drbd.org/users-guide/s-heartbeat-dopd.html, however.

Cheers,
Florian