[DRBD-user] Current Primary shall become sync TARGET! Aborting to prevent data corruption.

Jonathan Trott drbd at macsupport.org
Sat Feb 19 13:15:15 CET 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


The situation:

drbd 0.7.10
Fedora Core 1
kernel 2.4.26

Osacon2 is the active member of the heartbeat based cluster. A STONITH 
event is triggered by unplugging both heartbeat cables (including the 
drbd sync cable). The STONITH is successful in failing over the cluster 
and osacon1 becomes the active member of the cluster. When osacon2 
loads the drbd service during the boot process, osacon1 refuses the 
connection and goes into StandAlone cstate. This causes the cluster to 
no longer have redundancy. To fix this requires a drbdsetup /dev/drbd0 
net command to be run on osacon1 and the drbd service restarted on 
osacon2. Not a very automated process.

The question:
Why does drbd come up and error out and be left in standalone cstate? 
Shouldn't the state of drbd on osacon2 be secondary as it loads and 
therefore not cause an error when it tries to sync with osacon1? Is 
there some way to avoid this event in this scenario? It is completely 
reproducible and severely degrades the redundancy of this cluster.

If the drbd sync cable is unplugged, then re-plugged in a minute later 
there is no problems with re-establishing the drbd connection. The 
problem only occurs if the Primary is rebooted and before it comes back 
online the other node becomes the Primary.

The following logs are from the event where the drbd service stars on 
osacon2 after the STONITH event.

Configuration file follows the log entries.

*** Active cluster member
Feb 19 16:58:44 osacon1 kernel: drbd0: drbd0_receiver [3372]: cstate 
WFConnection --> WFReportParams
Feb 19 16:58:44 osacon1 kernel: drbd0: Handshake successful: DRBD 
Network Protocol version 74
Feb 19 16:58:44 osacon1 kernel: drbd0: Connection established.
Feb 19 16:58:44 osacon1 kernel: drbd0: I am(P): 
1:00000002:00000001:00000009:00000003:10
Feb 19 16:58:44 osacon1 kernel: drbd0: Peer(S): 
1:00000002:00000001:0000000a:00000002:10
Feb 19 16:58:44 osacon1 kernel: drbd0: Current Primary shall become 
sync TARGET! Aborting to prevent data corruption.
Feb 19 16:58:44 osacon1 kernel: drbd0: drbd0_receiver [3372]: cstate 
WFReportParams --> StandAlone
Feb 19 16:58:44 osacon1 kernel: drbd0: error receiving ReportParams, l: 
72!
Feb 19 16:58:44 osacon1 kernel: drbd0: asender terminated
Feb 19 16:58:44 osacon1 kernel: drbd0: worker terminated
Feb 19 16:58:44 osacon1 kernel: drbd0: drbd0_receiver [3372]: cstate 
StandAlone --> StandAlone
Feb 19 16:58:44 osacon1 kernel: drbd0: Connection lost.
Feb 19 16:58:44 osacon1 kernel: drbd0: receiver terminated

*** Passive cluster member booting up after a STONITH event (was active 
before STONITH)
Feb 19 16:58:44 osacon2 kernel: drbd0: drbd0_receiver [1574]: cstate 
WFConnection --> WFReportParams
Feb 19 16:58:44 osacon2 kernel: drbd0: Handshake successful: DRBD 
Network Protocol version 74
Feb 19 16:58:44 osacon2 kernel: drbd0: Connection established.
Feb 19 16:58:44 osacon2 kernel: drbd0: I am(S): 
1:00000002:00000001:0000000a:00000002:10
Feb 19 16:58:44 osacon2 kernel: drbd0: Peer(P): 
1:00000002:00000001:00000009:00000003:10
Feb 19 16:58:44 osacon2 kernel: drbd0: drbd0_receiver [1574]: cstate 
WFReportParams --> WFBitMapS
Feb 19 16:58:44 osacon2 kernel: drbd0: meta connection shut down by 
peer.
Feb 19 16:58:44 osacon2 kernel: drbd0: drbd0_asender [1606]: cstate 
WFBitMapS --> NetworkFailure
Feb 19 16:58:44 osacon2 kernel: drbd0: asender terminated
Feb 19 16:58:44 osacon2 drbd: WARN: stdin/stdout is not a TTY; using 
/dev/console
Feb 19 16:58:44 osacon2 kernel: drbd0: sock_sendmsg returned -104
Feb 19 16:58:44 osacon2 kernel: drbd0: drbd0_receiver [1574]: cstate 
NetworkFailure --> BrokenPipe
Feb 19 16:58:44 osacon2 kernel: drbd0: short sent ReportBitMap 
size=4096 sent=3800
Feb 19 16:58:44 osacon2 rc: Starting drbd:  succeeded
Feb 19 16:58:44 osacon2 kernel: drbd0: Secondary/Unknown --> 
Secondary/Primary
Feb 19 16:58:44 osacon2 kernel: drbd0: sock was shut down by peer
Feb 19 16:58:44 osacon2 kernel: drbd0: drbd0_receiver [1574]: cstate 
BrokenPipe --> BrokenPipe
Feb 19 16:58:44 osacon2 kernel: drbd0: short read expecting header on 
sock: r=0
Feb 19 16:58:44 osacon2 kernel: drbd0: worker terminated
Feb 19 16:58:44 osacon2 kernel: drbd0: drbd0_receiver [1574]: cstate 
BrokenPipe --> Unconnected
Feb 19 16:58:44 osacon2 kernel: drbd0: Connection lost.
Feb 19 16:58:44 osacon2 kernel: drbd0: drbd0_receiver [1574]: cstate 
Unconnected --> WFConnection

/etc/drbd.conf

resource drbd0 {
   protocol C;
   incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; 
halt -f";
   startup {
     degr-wfc-timeout 120;    # 2 minutes.
   }

   disk {
     on-io-error   detach;
         }

   net {
     on-disconnect reconnect;
         }

   syncer {
     rate 15M;
     group 1;
     al-extents 257;
   }

   on osacon1.osa.int {
     device      /dev/drbd0;
     disk        /dev/hda6;
     address     10.127.0.2:7788;
     meta-disk   internal;
   }

   on osacon2.osa.int {
     device      /dev/drbd0;
     disk        /dev/hda6;
     address     10.127.0.3:7788;
     meta-disk   internal;
   }
}

Thanks,
JT




More information about the drbd-user mailing list