Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 01/13/2012 04:59 AM, Luis M. Carril wrote: > Hello, > > I´m new to DRBD and I think that I have a mess with some concepts and > policies. Welcome! DRBD is a bit different from many storage concepts, so it takes a bit to wrap your head around. However, careful not to overthink things... It's fundamentally quite straight forward. > I have setup a two node cluster (of virtual machines) with a shared > volume in dual primary mode with ocfs2 as a basic infrastructure for > some testings. Do you have fencing? Dual-primary can not operate safely without a mechanism for ensuring the state of the remote node. > I need that when one of the two nodes goes down the other continues > working normally (we can assume that the other node never will recover > again), but when one node fails The assumption that the other will never return is not a concept that DRBD can assume. This is where fencing comes in... When a node loses contact with it's peer, it has no way of knowing what state the remote node is in. Is it still running, but thinks the local peer is gone? Is the silent node hung, but might return at some point? Is the remote node powered off? The only think you know is what you don't know. Consider; Both nodes, had they simply assumed "silence == death", go StandAlone and Primary. During this time, data is written to either node but that data is not replicated. Now you have divergent data and the only mechanism to recover is to invalidate the changes on one of the nodes. Data loss. The solution is fencing and resource management, which is what Andreas meant when he asked about pacemaker vs. rgmanager. > the other enter in WFConnection state and the volume is disconnected, > I have setup the standar set of policies for split brain: > > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > > Which policy should I use to achieve the desired behaivour (if one > node fails, the other continue working alone)? > > Regards Again, as Andreas indicated, this controls the policy when comms are lost (be it because of a network error, peer dieing/hanging, whatever). It is by design that a node, after losing it's peer, goes into WFConnection (waiting for connection). In this state, if/when the peer recovers (as it often does with power fencing), the peer can re-establish the connection, sync changes and return to a normal operating state. -- Digimer E-Mail: digimer at alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron