[DRBD-user] drbd resource is not getting promoted

Theophanis Kontogiannis theophanis_kontogiannis at yahoo.com
Sun Nov 15 13:00:40 CET 2009


Hello all,

I have two nodes cluster CentOS 5.4, self compiled drbd-8.3.5-3 against
kernel 2.6.18-164.6.1.el5.

The cluster runs drbd resources in primary/primary

However always on tweety1, drbd2 (almost all of the times) and drbd0&1
(some times) the resources do not get promoted.

following are kernel messages on tweety1 for the last event related to
r2:

block drbd2: Starting worker thread (from cqueue/0 [183])
block drbd2: disk( Diskless -> Attaching ) 
block drbd2: Found 6 transactions (276 active extents) in activity log.
block drbd2: Method to ensure write ordering: barrier
block drbd2: max_segment_size ( = BIO size ) = 32768
block drbd2: drbd_bm_resize called with capacity == 1953460304
block drbd2: resync bitmap: bits=244182538 words=3815353
block drbd2: size = 931 GB (976730152 KB)
block drbd2: recounting of set bits took additional 70 jiffies
block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
block drbd2: disk( Attaching -> Outdated ) pdsk( DUnknown -> Outdated ) 
block drbd2: conn( StandAlone -> Unconnected ) 
block drbd2: Starting receiver thread (from drbd2_worker [2746])
block drbd2: receiver (re)started
block drbd2: conn( Unconnected -> WFConnection ) 
block drbd2: Handshake successful: Agreed network protocol version 91
block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd2: conn( WFConnection -> WFReportParams ) 
block drbd2: Starting asender thread (from drbd2_receiver [2766])
block drbd2: data-integrity-alg: <not-used>
block drbd2: drbd_sync_handshake:
block drbd2: self
9CFD298D943949EE:0000000000000000:9C3AD9517D750E0D:8A45AE63D53852DB
bits:0 flags:0
block drbd2: peer
E3BC1675D02BC8BD:9CFD298D943949EF:9C3AD9517D750E0D:8A45AE63D53852DB
bits:127 flags:0
block drbd2: uuid_compare()=-1 by rule 50
block drbd2: peer( Unknown -> Primary ) conn( WFReportParams ->
WFBitMapT ) pdsk( Outdated -> UpToDate ) 
block drbd2: conn( WFBitMapT -> WFSyncUUID ) 
block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2
block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2
exit code 0 (0x0)
block drbd2: conn( WFSyncUUID -> SyncTarget ) disk( Outdated ->
Inconsistent ) 
block drbd2: Began resync as SyncTarget (will sync 508 KB [127 bits
set]).
block drbd2: Resync done (total 1 sec; paused 0 sec; 508 K/sec)
block drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent ->
UpToDate ) 
block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2
block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2
exit code 0 (0x0)
block drbd2: peer( Primary -> Secondary ) 
block drbd2: peer( Secondary -> Unknown ) conn( Connected -> TearDown )
pdsk( UpToDate -> DUnknown ) 
block drbd2: meta connection shut down by peer.
block drbd2: asender terminated
block drbd2: Terminating asender thread
block drbd2: Connection closed
block drbd2: conn( TearDown -> Unconnected ) 
block drbd2: receiver terminated
block drbd2: Restarting receiver thread
block drbd2: receiver (re)started
block drbd2: conn( Unconnected -> WFConnection ) 
block drbd2: Handshake successful: Agreed network protocol version 91
block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd2: conn( WFConnection -> WFReportParams ) 
block drbd2: Starting asender thread (from drbd2_receiver [2766])
block drbd2: data-integrity-alg: <not-used>
block drbd2: drbd_sync_handshake:
block drbd2: self
E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF
bits:0 flags:0
block drbd2: peer
E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF
bits:0 flags:0
block drbd2: uuid_compare()=0 by rule 40
block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams ->
Connected ) pdsk( DUnknown -> UpToDate ) 
block drbd2: peer( Secondary -> Primary )


and on tweety2

block drbd2: Starting worker thread (from cqueue/0 [183])
block drbd2: disk( Diskless -> Attaching ) 
block drbd2: Found 6 transactions (276 active extents) in activity log.
block drbd2: Method to ensure write ordering: barrier
block drbd2: max_segment_size ( = BIO size ) = 32768
block drbd2: drbd_bm_resize called with capacity == 1953460304
block drbd2: resync bitmap: bits=244182538 words=3815353
block drbd2: size = 931 GB (976730152 KB)
block drbd2: recounting of set bits took additional 70 jiffies
block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
block drbd2: disk( Attaching -> Outdated ) pdsk( DUnknown -> Outdated ) 
block drbd2: conn( StandAlone -> Unconnected ) 
block drbd2: Starting receiver thread (from drbd2_worker [2746])
block drbd2: receiver (re)started
block drbd2: conn( Unconnected -> WFConnection ) 
block drbd2: Handshake successful: Agreed network protocol version 91
block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd2: conn( WFConnection -> WFReportParams ) 
block drbd2: Starting asender thread (from drbd2_receiver [2766])
block drbd2: data-integrity-alg: <not-used>
block drbd2: drbd_sync_handshake:
block drbd2: self
9CFD298D943949EE:0000000000000000:9C3AD9517D750E0D:8A45AE63D53852DB
bits:0 flags:0
block drbd2: peer
E3BC1675D02BC8BD:9CFD298D943949EF:9C3AD9517D750E0D:8A45AE63D53852DB
bits:127 flags:0
block drbd2: uuid_compare()=-1 by rule 50
block drbd2: peer( Unknown -> Primary ) conn( WFReportParams ->
WFBitMapT ) pdsk( Outdated -> UpToDate ) 
block drbd2: conn( WFBitMapT -> WFSyncUUID ) 
block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2
block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2
exit code 0 (0x0)
block drbd2: conn( WFSyncUUID -> SyncTarget ) disk( Outdated ->
Inconsistent ) 
block drbd2: Began resync as SyncTarget (will sync 508 KB [127 bits
set]).
block drbd2: Resync done (total 1 sec; paused 0 sec; 508 K/sec)
block drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent ->
UpToDate ) 
block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2
block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2
exit code 0 (0x0)
block drbd2: peer( Primary -> Secondary ) 
block drbd2: peer( Secondary -> Unknown ) conn( Connected -> TearDown )
pdsk( UpToDate -> DUnknown ) 
block drbd2: meta connection shut down by peer.
block drbd2: asender terminated
block drbd2: Terminating asender thread
block drbd2: Connection closed
block drbd2: conn( TearDown -> Unconnected ) 
block drbd2: receiver terminated
block drbd2: Restarting receiver thread
block drbd2: receiver (re)started
block drbd2: conn( Unconnected -> WFConnection ) 
block drbd2: Handshake successful: Agreed network protocol version 91
block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd2: conn( WFConnection -> WFReportParams ) 
block drbd2: Starting asender thread (from drbd2_receiver [2766])
block drbd2: data-integrity-alg: <not-used>
block drbd2: drbd_sync_handshake:
block drbd2: self
E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF
bits:0 flags:0
block drbd2: peer
E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF
bits:0 flags:0
block drbd2: uuid_compare()=0 by rule 40
block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams ->
Connected ) pdsk( DUnknown -> UpToDate ) 
block drbd2: peer( Secondary -> Primary )



my drbd.conf is:


global {
    # minor-count 64;

    # dialog-refresh 5; # 5 seconds

    # disable-ip-verification;

    usage-count yes;
}



common {

  protocol C;

  syncer {

    rate 100M;

    #after "r2";
    al-extents 257;
  }
  
handlers {
    

    pri-on-incon-degr "echo b > /proc/sysrq-trigger ; reboot -f";
    pri-lost-after-sb "echo b > /proc/sysrq-trigger ; reboot -f";
    local-io-error "echo o > /proc/sysrq-trigger ; halt -f";


    outdate-peer "/sbin/obliterate";

    pri-lost "echo pri-lost. Have a look at the log files. | mail -s
'DRBD Alert' root; echo b > /proc/sysrq-trigger ; reboot -f";

    split-brain "echo split-brain. drbdadm -- --discard-my-data connect
$DRBD_RESOURCE ? | mail -s 'DRBD Alert' root";

  }

  startup {
     wfc-timeout  60;

    degr-wfc-timeout 60;    # 1 minutes.
    #wait-after-sb;

    outdated-wfc-timeout 30;

    become-primary-on both;

  }

  disk {
    #on-io-error   pass-on;

    fencing resource-and-stonith;
    # size 10G;
  }

  net {
    
     sndbuf-size 512k;

     timeout       60;    #  6 seconds  (unit = 0.1 seconds)
     connect-int   10;    # 10 seconds  (unit = 1 second)
     ping-int      10;    # 10 seconds  (unit = 1 second)
     ping-timeout  50;    # 500 ms (unit = 0.1 seconds)

     max-buffers     2048;

    # unplug-watermark   128;
     max-epoch-size  2048;
     ko-count 10;

    allow-two-primaries;

      cram-hmac-alg "*****";
      shared-secret "*****";
    after-sb-0pri discard-least-changes;
    #after-sb-0pri discard-younger-primary;
    #after-sb-0pri discard-older-primary;

    after-sb-1pri violently-as0p;

    after-sb-2pri violently-as0p;
    rr-conflict call-pri-lost;


#    data-integrity-alg "crc32c";

  }


}


resource r0 {

        device          /dev/drbd0;
        disk            /dev/hda4;
        meta-disk       internal;

on tweety-1 { address   10.254.254.253:7788; }

on tweety-2 { address   10.254.254.254:7788; }

}

resource r1 {

        device        /dev/drbd1;
        disk          /dev/hdb4;
        meta-disk     internal;

  on tweety-1 { address  10.254.254.253:7789; }

  on tweety-2 { address  10.254.254.254:7789; }
}

resource r2 {

device /dev/drbd2;
disk /dev/sda1;
meta-disk internal;

  on tweety-1 { address  10.254.254.253:7790; }

  on tweety-2 { address  10.254.254.254:7790; }
}



'drbdadm primary r2', promotes the resource to primary without problem.


Am I doing something wrong?

Thank you All for any help.

Theophanis Kontogiannis

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20091115/2a0eb914/attachment-0001.htm>


More information about the drbd-user mailing list