[DRBD-user] What causes nodes to become out-of-sync?

Jeffrey Froman drbd.tcijf at olympus.net
Mon Jul 21 18:53:09 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> what are the sorts of things that might be causing 
> blocks to become out-of-sync on this resource?

A bit of follow-up: verification errors have been hitting us this week 
with increased frequency. We are using Protocol C for replication of 
a 135GB resource, and the secondary device reports no disk errors of 
any kind.

Yet verification continues to fail -- now up to about 3 times per week 
(checked daily). We now feel rather unconfident that we can switch 
roles on the two nodes reliably at any time, since we have no idea 
when or why synchronization is breaking.

Is it possible that we are experiencing a race condition between 
verification and replication, and that the blocks which verification 
finds out-of-sync have actually just been updated by normal 
replication between nodes during the checksum comparison?

If so, is there any option that will force retries of failed checksum 
comparisons for each block that is compared during verification?

Is there anything else I can add to my configuration to help determine 
exactly when these blocks are falling out-of-sync?

Any suggestions appreciated. The configuration for this resource is 
detailed below.

Thank you,

resource r0 {
    protocol C;
    syncer {
        rate 100M;
        verify-alg sha1;

    net {
        cram-hmac-alg sha1;
        shared-secret "secret";

    handlers {
        out-of-sync "/opt/drbd/handlers/out-of-sync.sh 

    disk {
        size 143372060K;
        on-io-error pass_on;

    on host1 {
        device     /dev/drbd0;
        disk       /dev/sdc1;
        meta-disk  /dev/sda10[0];

    on host2 {
        device     /dev/drbd0;
        disk       /dev/mapper/VolGroup00-LogVol02;
        meta-disk  /dev/VolGroup00/LogVol03[0];

More information about the drbd-user mailing list