[DRBD-user] Avoid split brain in a dual primary configuration with intelligent switches

Tue Mar 17 16:06:42 CET 2009

Mario Giammarco <mgiammarco at ...> writes:
>My idea (correct me if I am wrong) is this: when one primary finds that it
cannot talk to other primary it tries to ping switches.If it cannot ping
switches it means that all its ethernet cards or all switches are broken so it
shutdown itself.If it can ping switches it means that other primary is broken so
it tries to stonith it.After reconnection it is clear that the primary that
cannot ping the switches must resync with other one.

I am replying to myself. Re-reading drbd documentation I finally found the write
quorum explanation.
And I discovered that why, using dual primary, I always get a split brain after
a disconnection.

Now I do not understand two things:

- why single primary mode (master slave) does not need a write quorum;
- how dual primary works. I think about this (mode C):

Good communication:
server A receive an order to write a disk block;
server A writes it on disk;
A send it to B;
B writes and ack;
A receive ack and tell upper layer that write is good.

Bad communication:
server A receive an order to write a disk block;
server A writes it on disk;
A send it to B;
COMMUNICATION FAILURE
B does not receive anything nor it can reply;
A timeouts and sends upper layer a write error.

I suppose now that A and B are blocked (they cannot complete writes) and I (or
the cluster manager) can decide to shutdown one server.

My question is: I still does not understand what drbd really do in this
situation, is like above or is different?
The other question is why single primary mode (master slave) does not need a
write quorum?

Thanks again for  help!