Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 01/31/2012 08:51 AM, Xing, Steven wrote: > Thanks a lot, Kaloyan. > Could you give me more detail about the way you mentioned for make the previous primary automatically promote even the disk status are "Consistent", even it is not safe. > Do I need write some script or just need change some drbd settings? > Thanks again. That would be very helpful. Elaborating on what Kaloyan said; You will be running the risk of a split-brain situation which can lead to data loss. It is highly ill advised to automate the promotion of a Consistent node to UpToDate. It is much more wise to instead avoid the situation in the first place. The problem is that, in clustering, there is an idea that "The only thing you don't know is what you don't know." When the old primary recovers, it can't know what happened to it's peer in the time that it was offline. As Kaloyan said, if the secondary had been promoted to primary then the old primary will have an old view of the data. If you force it to UpToDate and start writing data to it, and between the time of the fault the backup had been made primary and had data written to it, you now have good data on both nodes that is out of sync. The only option to recover is to discard the changes on one of the nodes, hence, data loss. With this said; If the DRBD resource is part of a cluster proper, like pacemaker or rhcs, then you can tie DRBD's fencing into the cluster using crm-fence-peer.sh or obliterate-peer.sh/rhcs_fence on pacemaker or rhcs, respectively. Setup DRBD to be a resource of the cluster and then set the cluster to fence it's peer when it starts, if it doesn't respond when the cluster starts (assuming a 2-node cluster). This will work because the cluster will start and (power) fence the peer. Assuming the peer isn't dead, just off, the peer should boot up. The node will then start DRBD which will start waiting for it's peer. Meanwhile, the peer is booting and should come online, join the cluster and start DRBD. As soon as it does, the old Primary will know that it really is UpToDate and start up safely. If you want to force the issue though, you can use 'wfc-timeout 300' which will tell DRBD to wait up to 5 minutes for it's peer. After that time, consider itself primary. Please don't use this though until you've exhausted all other ways of starting safely. -- Digimer E-Mail: digimer at alteeve.com Papers and Projects: https://alteeve.com