Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Doro, > Hello, > > We have set up our first HA 2-node cluster and are currently running > soak tests (private DRBD replication link, private heartbeat link and > 2nd heartbeat via DRBD replication link, no dopd, active/passive). We > have experienced > several times that after a fail-over test, e.g. unplugging the DRBD > replication link, Unplugging the DRBD replication link is not a failover test. What failover tests _did_ you run? > DRBD is in a state of Primary/Unknown, Secondary/Unknown and > will not synchorize. We have to issue several drbdadm commands > like > ha1# drbdadm secondary drbd1 > ha2# drbdadm detach drbd1 > ha2# drbdadm -- --discard-my-data connect drbd1 > > which sometimes got it back. Our question is whether this is normal > behaviour of DRBD > to end up in such a state in the first place No, it is a consequence of misconfigured cluster communication link and/or missing outdate-peer functionality. See http://www.drbd.org/users-guide/s-outdate.html for some background on this. > and what is the recommended recovery? See http://www.drbd.org/users-guide/s-resolve-split-brain.html > Is there any timeout to tune to improve this? At this point you don't need to look into timeouts. It may be wise to take a peek at http://www.drbd.org/users-guide/s-heartbeat-dopd.html, however. Cheers, Florian