[DRBD-user] Possible DRBD Desync After Outage - Why?

Mike Sweetser - Adhost mikesw at adhost.com
Tue Jul 7 18:08:43 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello:

We have two DRBD machines running RHEL 5.3 with DRBD 8.3.0.  Recently, we had an outage that took the primary server in the cluster down, leaving it to failover using DRBD and Heartbeat.  This was done with no issues.   

When the other server came back online, we initiated a manual resync as follows:

drbdadm secondary RESOURCE
drbdadm -- --discard-my-data connect RESOURCE

Then from the live server, we did drbdadm connect RESOURCE, and it connected and resynced.

Assuming all this was done right, we ran into other issues - some people have complained that their files have "reverted" to a previous state.  We don't show any errors occuring in the synchronization of the files, and never saw any "oos" in the DRBD status.  

So how could this have happened?  What can be done, outside of regular "drbdadm verify"s, to combat this problem?  And honestly, why is it necessary to do manual verification when file integrity of this nature should be a fundamental part of any file system duplication of this nature? 

I've attached my drbd.conf here - feel free to mention if I've done something stupid.

resource r1 {
  protocol C;
  handlers {
    pri-on-incon-degr "echo 'DRBD: primary requested but inconsistent!' | wall; /etc/init.d/heartbeat stop"; #"halt -f";
    pri-lost-after-sb "echo 'DRBD: primary requested but lost!' | wall; /etc/init.d/heartbeat stop"; #"halt -f";
  }

  startup {
    degr-wfc-timeout 120;    # 2 minutes.
    wfc-timeout 120;    # 2 minutes.
  }

  disk {
    no-disk-flushes;
    on-io-error   detach;
  }

  net {
    timeout 120;
    connect-int 20;
    ping-int 20;
    max-buffers     2048;
    max-epoch-size  2048;
    ko-count 30;
    cram-hmac-alg "sha1";
    shared-secret "******";
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
  }

  syncer {
    rate 30M;
    al-extents 503;
    verify-alg sha1;
  }

  on SERVER1 {
    device    /dev/drbd1;
    disk      /dev/sda7;
    address   172.16.2.1:7789;
    meta-disk /dev/sda6[1];
  }

  on SERVER2 {
    device    /dev/drbd1;
    disk      /dev/sda7;
    address   172.16.2.2:7789;
    meta-disk /dev/sda6[1];
  }
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090707/3a8bc146/attachment.htm>


More information about the drbd-user mailing list