Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Mario Giammarco <mgiammarco at ...> writes: >My idea (correct me if I am wrong) is this: when one primary finds that it cannot talk to other primary it tries to ping switches.If it cannot ping switches it means that all its ethernet cards or all switches are broken so it shutdown itself.If it can ping switches it means that other primary is broken so it tries to stonith it.After reconnection it is clear that the primary that cannot ping the switches must resync with other one. I am replying to myself. Re-reading drbd documentation I finally found the write quorum explanation. And I discovered that why, using dual primary, I always get a split brain after a disconnection. Now I do not understand two things: - why single primary mode (master slave) does not need a write quorum; - how dual primary works. I think about this (mode C): Good communication: server A receive an order to write a disk block; server A writes it on disk; A send it to B; B writes and ack; A receive ack and tell upper layer that write is good. Bad communication: server A receive an order to write a disk block; server A writes it on disk; A send it to B; COMMUNICATION FAILURE B does not receive anything nor it can reply; A timeouts and sends upper layer a write error. I suppose now that A and B are blocked (they cannot complete writes) and I (or the cluster manager) can decide to shutdown one server. My question is: I still does not understand what drbd really do in this situation, is like above or is different? The other question is why single primary mode (master slave) does not need a write quorum? Thanks again for help!