[DRBD-user] Help with recovering from network failure (failover and back again)

Philipp Reisner philipp.reisner at linbit.com
Tue Sep 13 11:06:29 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Am Montag, 12. September 2005 18:06 schrieb Guus Houtzager:
> Hi,
>
> For some time now I've been trying to build an HA setup with drbd and I
> keep running into the same thing every time. I'm building a setup in
> which 1 half of the drbd will be in a geographically different location,
> so I need to test what happens if the netwerk connection between those
> locations fails. To that end I've configured 2 machines, fs1 and fs2,
> each with 2 nics. One of those nics per machine is dedicated for drbd. I
> didn't use a crosscable, but connected those to a managed switch, so I
> can easily shutdown ports to simulate a network failure.
> When all switchports are enabled, everything works just fine. Drbd
> works, I can swap the primary to the other machine (first drbdadm
> primary all on the primary, then drbdadm primary all on the secondary),
> no problem whatsoever.
> I have one drbd resource, r1, everything started and ok (connected and
> consistent), fs1 = primary, fs2 = secondary.
> At that point I shutdown the switchport of the drbd nic of fs1. A few
> seconds later both sides notice they can't connect to the other side and
> change the status of that side to Unknown. Now I want fs2 to become
> primary (apperently something is wrong with fs1, so I want application
> servers on the location of fs2 to take over with fs2 as fileserver), so
> I do a drbdadm primary all on fs2 and a drbdadm secondary all on fs1
> (just to be sure, can't have 2 primaries when I re-enable the
> switchport). Both sides update their status accordingly.
> If I then re-enable the switchport, both sides "see" each other again,
> but won't reconnect, because fs1 wants to sync as source with fs2 as
> target. That seems totally wrong to me. I expect fs1 to become a
> secondary with fs2 primary. Fs2 does refuse the sync (as it should) and
> aborts. The strange part is that if I stop the drbd device on fs2 en
> restart it, it comes up as secondary (correct) and syncs back to fs1
> with fs2 source and fs1 target, just as it should!
> I'm running 0.7.13 on a 2.4 kernel.
> I hope someone can help me out with this!
>

Hi, 

  In the moment when you disconnect the device pair and fs1 is primary
  we have to assume that the data is modified on fs1.
  It is not possible to do a gracefull switchover while fs1 and fs2 
  are disconnected. 

  What you do with the "gracefull switchover" seems to the algorithms
  of DRBD-0.7 as split brain situation. In DRBD-0.7 there is a 
  auto-recover logic implemented, that says: the node that was
  primary at split brain time, has the good data.

  To really understand what goes on, please read the
 http://www.drbd.org/fileadmin/drbd/publications/drbd_paper_for_NLUUG_2001.pdf
  paper. And... there are tools to modify the GCs __offline__ (this means
  /etc/init.d/drbd stop!! )


  All this got fixed in drbd-8. There you would express, what you want
  with these config statements:

  after-sb-0pri discard-older-primary;
  after-sb-1pri consensus;
  after-sb-2pri disconnect;

  [ See item 5 of http://svn.drbd.org/drbd/trunk/ROADMAP ]

The bottom line: This is a shortcomming of drbd-0.7, that is already
fixed in drbd-8 . DRBD-8 is not yet ready....

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :



More information about the drbd-user mailing list