[DRBD-user] what causes SC to endup in rule_nr 4 in a normal shutdown sequence

Wed Sep 15 13:31:14 CEST 2010

Iam running managed failovers and after 7 failovers, i have dmesg print as 

In a managed failovers, drbd partition is unmounted first and then "drbdadm Secondary all" is invoked.

Network is not touched until drbd transition to secondary is done.

Any help to avoid this in managed failovers is appreciated.

drbd0: drbd_sync_handshake:
drbd0: self B3846B4B8BDF8064:B3846B4B8BDF8065:E3596DA0539760BC:0ECF8BD380756C2D
drbd0: peer B3846B4B8BDF8065:B3846B4B8BDF8065:E3596DA0539760BD:0ECF8BD380756C2D
drbd0: uuid_compare()=0 by rule 4
drbd0: No resync, but 278 bits in bitmap!

 *rule_nr = 4;
 if (self == peer) { /* Common power [off|failure] */
  int rct, dc; /* roles at crash time */

  rct = (test_bit(CRASHED_PRIMARY, &mdev->flags) ? 1 : 0) +
   (mdev->p_uuid[UUID_FLAGS] & 2);
  /* lowest bit is set when we were primary,
   * next bit (weight 2) is set when peer was primary */

  MTRACE(TraceTypeUuid, TraceLvlMetrics, DUMPI(rct); );

  switch (rct) {
  case 0: /* !self_pri && !peer_pri */ return 0;
  case 1: /*  self_pri && !peer_pri */ return 1;
  case 2: /* !self_pri &&  peer_pri */ return -1;
  case 3: /*  self_pri &&  peer_pri */
   dc = test_bit(DISCARD_CONCURRENT, &mdev->flags);
   MTRACE(TraceTypeUuid, TraceLvlMetrics, DUMPI(dc); );
   return dc ? -1 : 1;
  }
 }

Thanks and Regards
Lak

From: putcha_laks at hotmail.com
To: drbd-user at lists.linbit.com
Date: Wed, 15 Sep 2010 04:35:03 +0000
Subject: [DRBD-user] What is the recommended recovery action/config for rules 5 and 6

DRBD version: 8.0.16
We are forced to use this version of drbd, because of the older kernel version.
Every 10 failovers i run into  either rule_nr = 5 or rule_nr = 6.
Its been over a month since we are trying to figure out the rootcause for this, but in vain.

We have terminated all apps accessing drbd partition before calling drbdadm secondary all and umount drbd partition.
With this change we are able to run about 30 failovers successfully. But we have hit rule_nr 5 after this.

Can any one of you help us with recommended recovery action / config change for rules 5 and 6.

*rule_nr = 5;
 peer = mdev->p_uuid[Bitmap] & ~((u64)1);
 if (self == peer)
  return -1;
 *rule_nr = 6;
 for (i = History_start; i <= History_end; i++) {
  peer = mdev->p_uuid[i] & ~((u64)1);
  if (self == peer)
   return -2;
 }

Thanks and Regards
Lak.

_______________________________________________ drbd-user mailing list drbd-user at lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100915/3e08ebb4/attachment.htm>