<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.26.3">
</HEAD>
<BODY>
Hello all,<BR>
<BR>
I have two nodes cluster CentOS 5.4, self compiled drbd-8.3.5-3 against kernel 2.6.18-164.6.1.el5.<BR>
<BR>
The cluster runs drbd resources in primary/primary<BR>
<BR>
However always on tweety1, drbd2 (almost all of the times) and drbd0&1 (some times) the resources do not get promoted.<BR>
<BR>
following are kernel messages on tweety1 for the last event related to r2:<BR>
<BR>
block drbd2: Starting worker thread (from cqueue/0 [183])<BR>
block drbd2: disk( Diskless -> Attaching ) <BR>
block drbd2: Found 6 transactions (276 active extents) in activity log.<BR>
block drbd2: Method to ensure write ordering: barrier<BR>
block drbd2: max_segment_size ( = BIO size ) = 32768<BR>
block drbd2: drbd_bm_resize called with capacity == 1953460304<BR>
block drbd2: resync bitmap: bits=244182538 words=3815353<BR>
block drbd2: size = 931 GB (976730152 KB)<BR>
block drbd2: recounting of set bits took additional 70 jiffies<BR>
block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map.<BR>
block drbd2: disk( Attaching -> Outdated ) pdsk( DUnknown -> Outdated ) <BR>
block drbd2: conn( StandAlone -> Unconnected ) <BR>
block drbd2: Starting receiver thread (from drbd2_worker [2746])<BR>
block drbd2: receiver (re)started<BR>
block drbd2: conn( Unconnected -> WFConnection ) <BR>
block drbd2: Handshake successful: Agreed network protocol version 91<BR>
block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC<BR>
block drbd2: conn( WFConnection -> WFReportParams ) <BR>
block drbd2: Starting asender thread (from drbd2_receiver [2766])<BR>
block drbd2: data-integrity-alg: <not-used><BR>
block drbd2: drbd_sync_handshake:<BR>
block drbd2: self 9CFD298D943949EE:0000000000000000:9C3AD9517D750E0D:8A45AE63D53852DB bits:0 flags:0<BR>
block drbd2: peer E3BC1675D02BC8BD:9CFD298D943949EF:9C3AD9517D750E0D:8A45AE63D53852DB bits:127 flags:0<BR>
block drbd2: uuid_compare()=-1 by rule 50<BR>
block drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( Outdated -> UpToDate ) <BR>
block drbd2: conn( WFBitMapT -> WFSyncUUID ) <BR>
block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2<BR>
block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2 exit code 0 (0x0)<BR>
block drbd2: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) <BR>
block drbd2: Began resync as SyncTarget (will sync 508 KB [127 bits set]).<BR>
block drbd2: Resync done (total 1 sec; paused 0 sec; 508 K/sec)<BR>
block drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) <BR>
block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2<BR>
block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2 exit code 0 (0x0)<BR>
block drbd2: peer( Primary -> Secondary ) <BR>
block drbd2: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) <BR>
block drbd2: meta connection shut down by peer.<BR>
block drbd2: asender terminated<BR>
block drbd2: Terminating asender thread<BR>
block drbd2: Connection closed<BR>
block drbd2: conn( TearDown -> Unconnected ) <BR>
block drbd2: receiver terminated<BR>
block drbd2: Restarting receiver thread<BR>
block drbd2: receiver (re)started<BR>
block drbd2: conn( Unconnected -> WFConnection ) <BR>
block drbd2: Handshake successful: Agreed network protocol version 91<BR>
block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC<BR>
block drbd2: conn( WFConnection -> WFReportParams ) <BR>
block drbd2: Starting asender thread (from drbd2_receiver [2766])<BR>
block drbd2: data-integrity-alg: <not-used><BR>
block drbd2: drbd_sync_handshake:<BR>
block drbd2: self E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF bits:0 flags:0<BR>
block drbd2: peer E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF bits:0 flags:0<BR>
block drbd2: uuid_compare()=0 by rule 40<BR>
block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) <BR>
block drbd2: peer( Secondary -> Primary )<BR>
<BR>
<BR>
and on tweety2<BR>
<BR>
block drbd2: Starting worker thread (from cqueue/0 [183])<BR>
block drbd2: disk( Diskless -> Attaching ) <BR>
block drbd2: Found 6 transactions (276 active extents) in activity log.<BR>
block drbd2: Method to ensure write ordering: barrier<BR>
block drbd2: max_segment_size ( = BIO size ) = 32768<BR>
block drbd2: drbd_bm_resize called with capacity == 1953460304<BR>
block drbd2: resync bitmap: bits=244182538 words=3815353<BR>
block drbd2: size = 931 GB (976730152 KB)<BR>
block drbd2: recounting of set bits took additional 70 jiffies<BR>
block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map.<BR>
block drbd2: disk( Attaching -> Outdated ) pdsk( DUnknown -> Outdated ) <BR>
block drbd2: conn( StandAlone -> Unconnected ) <BR>
block drbd2: Starting receiver thread (from drbd2_worker [2746])<BR>
block drbd2: receiver (re)started<BR>
block drbd2: conn( Unconnected -> WFConnection ) <BR>
block drbd2: Handshake successful: Agreed network protocol version 91<BR>
block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC<BR>
block drbd2: conn( WFConnection -> WFReportParams ) <BR>
block drbd2: Starting asender thread (from drbd2_receiver [2766])<BR>
block drbd2: data-integrity-alg: <not-used><BR>
block drbd2: drbd_sync_handshake:<BR>
block drbd2: self 9CFD298D943949EE:0000000000000000:9C3AD9517D750E0D:8A45AE63D53852DB bits:0 flags:0<BR>
block drbd2: peer E3BC1675D02BC8BD:9CFD298D943949EF:9C3AD9517D750E0D:8A45AE63D53852DB bits:127 flags:0<BR>
block drbd2: uuid_compare()=-1 by rule 50<BR>
block drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( Outdated -> UpToDate ) <BR>
block drbd2: conn( WFBitMapT -> WFSyncUUID ) <BR>
block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2<BR>
block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2 exit code 0 (0x0)<BR>
block drbd2: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) <BR>
block drbd2: Began resync as SyncTarget (will sync 508 KB [127 bits set]).<BR>
block drbd2: Resync done (total 1 sec; paused 0 sec; 508 K/sec)<BR>
block drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) <BR>
block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2<BR>
block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2 exit code 0 (0x0)<BR>
block drbd2: peer( Primary -> Secondary ) <BR>
block drbd2: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) <BR>
block drbd2: meta connection shut down by peer.<BR>
block drbd2: asender terminated<BR>
block drbd2: Terminating asender thread<BR>
block drbd2: Connection closed<BR>
block drbd2: conn( TearDown -> Unconnected ) <BR>
block drbd2: receiver terminated<BR>
block drbd2: Restarting receiver thread<BR>
block drbd2: receiver (re)started<BR>
block drbd2: conn( Unconnected -> WFConnection ) <BR>
block drbd2: Handshake successful: Agreed network protocol version 91<BR>
block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC<BR>
block drbd2: conn( WFConnection -> WFReportParams ) <BR>
block drbd2: Starting asender thread (from drbd2_receiver [2766])<BR>
block drbd2: data-integrity-alg: <not-used><BR>
block drbd2: drbd_sync_handshake:<BR>
block drbd2: self E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF bits:0 flags:0<BR>
block drbd2: peer E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF bits:0 flags:0<BR>
block drbd2: uuid_compare()=0 by rule 40<BR>
block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) <BR>
block drbd2: peer( Secondary -> Primary )<BR>
<BR>
<BR>
<BR>
my drbd.conf is:<BR>
<BR>
<BR>
global {<BR>
# minor-count 64;<BR>
<BR>
# dialog-refresh 5; # 5 seconds<BR>
<BR>
# disable-ip-verification;<BR>
<BR>
usage-count yes;<BR>
}<BR>
<BR>
<BR>
<BR>
common {<BR>
<BR>
protocol C;<BR>
<BR>
syncer {<BR>
<BR>
rate 100M;<BR>
<BR>
#after "r2";<BR>
al-extents 257;<BR>
}<BR>
<BR>
handlers {<BR>
<BR>
<BR>
pri-on-incon-degr "echo b > /proc/sysrq-trigger ; reboot -f";<BR>
pri-lost-after-sb "echo b > /proc/sysrq-trigger ; reboot -f";<BR>
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";<BR>
<BR>
<BR>
outdate-peer "/sbin/obliterate";<BR>
<BR>
pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root; echo b > /proc/sysrq-trigger ; reboot -f";<BR>
<BR>
split-brain "echo split-brain. drbdadm -- --discard-my-data connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert' root";<BR>
<BR>
}<BR>
<BR>
startup {<BR>
wfc-timeout 60;<BR>
<BR>
degr-wfc-timeout 60; # 1 minutes.<BR>
#wait-after-sb;<BR>
<BR>
outdated-wfc-timeout 30;<BR>
<BR>
become-primary-on both;<BR>
<BR>
}<BR>
<BR>
disk {<BR>
#on-io-error pass-on;<BR>
<BR>
fencing resource-and-stonith;<BR>
# size 10G;<BR>
}<BR>
<BR>
net {<BR>
<BR>
sndbuf-size 512k;<BR>
<BR>
timeout 60; # 6 seconds (unit = 0.1 seconds)<BR>
connect-int 10; # 10 seconds (unit = 1 second)<BR>
ping-int 10; # 10 seconds (unit = 1 second)<BR>
ping-timeout 50; # 500 ms (unit = 0.1 seconds)<BR>
<BR>
max-buffers 2048;<BR>
<BR>
# unplug-watermark 128;<BR>
max-epoch-size 2048;<BR>
ko-count 10;<BR>
<BR>
allow-two-primaries;<BR>
<BR>
cram-hmac-alg "*****";<BR>
shared-secret "*****";<BR>
after-sb-0pri discard-least-changes;<BR>
#after-sb-0pri discard-younger-primary;<BR>
#after-sb-0pri discard-older-primary;<BR>
<BR>
after-sb-1pri violently-as0p;<BR>
<BR>
after-sb-2pri violently-as0p;<BR>
rr-conflict call-pri-lost;<BR>
<BR>
<BR>
# data-integrity-alg "crc32c";<BR>
<BR>
}<BR>
<BR>
<BR>
}<BR>
<BR>
<BR>
resource r0 {<BR>
<BR>
device /dev/drbd0;<BR>
disk /dev/hda4;<BR>
meta-disk internal;<BR>
<BR>
on tweety-1 { address 10.254.254.253:7788; }<BR>
<BR>
on tweety-2 { address 10.254.254.254:7788; }<BR>
<BR>
}<BR>
<BR>
resource r1 {<BR>
<BR>
device /dev/drbd1;<BR>
disk /dev/hdb4;<BR>
meta-disk internal;<BR>
<BR>
on tweety-1 { address 10.254.254.253:7789; }<BR>
<BR>
on tweety-2 { address 10.254.254.254:7789; }<BR>
}<BR>
<BR>
resource r2 {<BR>
<BR>
device /dev/drbd2;<BR>
disk /dev/sda1;<BR>
meta-disk internal;<BR>
<BR>
on tweety-1 { address 10.254.254.253:7790; }<BR>
<BR>
on tweety-2 { address 10.254.254.254:7790; }<BR>
}<BR>
<BR>
<BR>
<BR>
'drbdadm primary r2', promotes the resource to primary without problem.<BR>
<BR>
<BR>
Am I doing something wrong?<BR>
<BR>
Thank you All for any help.<BR>
<BR>
Theophanis Kontogiannis<BR>
<BR>
</BODY>
</HTML>