Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Am Montag, 12. September 2005 18:06 schrieb Guus Houtzager: > Hi, > > For some time now I've been trying to build an HA setup with drbd and I > keep running into the same thing every time. I'm building a setup in > which 1 half of the drbd will be in a geographically different location, > so I need to test what happens if the netwerk connection between those > locations fails. To that end I've configured 2 machines, fs1 and fs2, > each with 2 nics. One of those nics per machine is dedicated for drbd. I > didn't use a crosscable, but connected those to a managed switch, so I > can easily shutdown ports to simulate a network failure. > When all switchports are enabled, everything works just fine. Drbd > works, I can swap the primary to the other machine (first drbdadm > primary all on the primary, then drbdadm primary all on the secondary), > no problem whatsoever. > I have one drbd resource, r1, everything started and ok (connected and > consistent), fs1 = primary, fs2 = secondary. > At that point I shutdown the switchport of the drbd nic of fs1. A few > seconds later both sides notice they can't connect to the other side and > change the status of that side to Unknown. Now I want fs2 to become > primary (apperently something is wrong with fs1, so I want application > servers on the location of fs2 to take over with fs2 as fileserver), so > I do a drbdadm primary all on fs2 and a drbdadm secondary all on fs1 > (just to be sure, can't have 2 primaries when I re-enable the > switchport). Both sides update their status accordingly. > If I then re-enable the switchport, both sides "see" each other again, > but won't reconnect, because fs1 wants to sync as source with fs2 as > target. That seems totally wrong to me. I expect fs1 to become a > secondary with fs2 primary. Fs2 does refuse the sync (as it should) and > aborts. The strange part is that if I stop the drbd device on fs2 en > restart it, it comes up as secondary (correct) and syncs back to fs1 > with fs2 source and fs1 target, just as it should! > I'm running 0.7.13 on a 2.4 kernel. > I hope someone can help me out with this! > Hi, In the moment when you disconnect the device pair and fs1 is primary we have to assume that the data is modified on fs1. It is not possible to do a gracefull switchover while fs1 and fs2 are disconnected. What you do with the "gracefull switchover" seems to the algorithms of DRBD-0.7 as split brain situation. In DRBD-0.7 there is a auto-recover logic implemented, that says: the node that was primary at split brain time, has the good data. To really understand what goes on, please read the http://www.drbd.org/fileadmin/drbd/publications/drbd_paper_for_NLUUG_2001.pdf paper. And... there are tools to modify the GCs __offline__ (this means /etc/init.d/drbd stop!! ) All this got fixed in drbd-8. There you would express, what you want with these config statements: after-sb-0pri discard-older-primary; after-sb-1pri consensus; after-sb-2pri disconnect; [ See item 5 of http://svn.drbd.org/drbd/trunk/ROADMAP ] The bottom line: This is a shortcomming of drbd-0.7, that is already fixed in drbd-8 . DRBD-8 is not yet ready.... -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com :