Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all, I have two nodes cluster CentOS 5.4, self compiled drbd-8.3.5-3 against kernel 2.6.18-164.6.1.el5. The cluster runs drbd resources in primary/primary However always on tweety1, drbd2 (almost all of the times) and drbd0&1 (some times) the resources do not get promoted. following are kernel messages on tweety1 for the last event related to r2: block drbd2: Starting worker thread (from cqueue/0 [183]) block drbd2: disk( Diskless -> Attaching ) block drbd2: Found 6 transactions (276 active extents) in activity log. block drbd2: Method to ensure write ordering: barrier block drbd2: max_segment_size ( = BIO size ) = 32768 block drbd2: drbd_bm_resize called with capacity == 1953460304 block drbd2: resync bitmap: bits=244182538 words=3815353 block drbd2: size = 931 GB (976730152 KB) block drbd2: recounting of set bits took additional 70 jiffies block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map. block drbd2: disk( Attaching -> Outdated ) pdsk( DUnknown -> Outdated ) block drbd2: conn( StandAlone -> Unconnected ) block drbd2: Starting receiver thread (from drbd2_worker [2746]) block drbd2: receiver (re)started block drbd2: conn( Unconnected -> WFConnection ) block drbd2: Handshake successful: Agreed network protocol version 91 block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC block drbd2: conn( WFConnection -> WFReportParams ) block drbd2: Starting asender thread (from drbd2_receiver [2766]) block drbd2: data-integrity-alg: <not-used> block drbd2: drbd_sync_handshake: block drbd2: self 9CFD298D943949EE:0000000000000000:9C3AD9517D750E0D:8A45AE63D53852DB bits:0 flags:0 block drbd2: peer E3BC1675D02BC8BD:9CFD298D943949EF:9C3AD9517D750E0D:8A45AE63D53852DB bits:127 flags:0 block drbd2: uuid_compare()=-1 by rule 50 block drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( Outdated -> UpToDate ) block drbd2: conn( WFBitMapT -> WFSyncUUID ) block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2 block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2 exit code 0 (0x0) block drbd2: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) block drbd2: Began resync as SyncTarget (will sync 508 KB [127 bits set]). block drbd2: Resync done (total 1 sec; paused 0 sec; 508 K/sec) block drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2 block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2 exit code 0 (0x0) block drbd2: peer( Primary -> Secondary ) block drbd2: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) block drbd2: meta connection shut down by peer. block drbd2: asender terminated block drbd2: Terminating asender thread block drbd2: Connection closed block drbd2: conn( TearDown -> Unconnected ) block drbd2: receiver terminated block drbd2: Restarting receiver thread block drbd2: receiver (re)started block drbd2: conn( Unconnected -> WFConnection ) block drbd2: Handshake successful: Agreed network protocol version 91 block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC block drbd2: conn( WFConnection -> WFReportParams ) block drbd2: Starting asender thread (from drbd2_receiver [2766]) block drbd2: data-integrity-alg: <not-used> block drbd2: drbd_sync_handshake: block drbd2: self E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF bits:0 flags:0 block drbd2: peer E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF bits:0 flags:0 block drbd2: uuid_compare()=0 by rule 40 block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) block drbd2: peer( Secondary -> Primary ) and on tweety2 block drbd2: Starting worker thread (from cqueue/0 [183]) block drbd2: disk( Diskless -> Attaching ) block drbd2: Found 6 transactions (276 active extents) in activity log. block drbd2: Method to ensure write ordering: barrier block drbd2: max_segment_size ( = BIO size ) = 32768 block drbd2: drbd_bm_resize called with capacity == 1953460304 block drbd2: resync bitmap: bits=244182538 words=3815353 block drbd2: size = 931 GB (976730152 KB) block drbd2: recounting of set bits took additional 70 jiffies block drbd2: 0 KB (0 bits) marked out-of-sync by on disk bit-map. block drbd2: disk( Attaching -> Outdated ) pdsk( DUnknown -> Outdated ) block drbd2: conn( StandAlone -> Unconnected ) block drbd2: Starting receiver thread (from drbd2_worker [2746]) block drbd2: receiver (re)started block drbd2: conn( Unconnected -> WFConnection ) block drbd2: Handshake successful: Agreed network protocol version 91 block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC block drbd2: conn( WFConnection -> WFReportParams ) block drbd2: Starting asender thread (from drbd2_receiver [2766]) block drbd2: data-integrity-alg: <not-used> block drbd2: drbd_sync_handshake: block drbd2: self 9CFD298D943949EE:0000000000000000:9C3AD9517D750E0D:8A45AE63D53852DB bits:0 flags:0 block drbd2: peer E3BC1675D02BC8BD:9CFD298D943949EF:9C3AD9517D750E0D:8A45AE63D53852DB bits:127 flags:0 block drbd2: uuid_compare()=-1 by rule 50 block drbd2: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( Outdated -> UpToDate ) block drbd2: conn( WFBitMapT -> WFSyncUUID ) block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2 block drbd2: helper command: /sbin/drbdadm before-resync-target minor-2 exit code 0 (0x0) block drbd2: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) block drbd2: Began resync as SyncTarget (will sync 508 KB [127 bits set]). block drbd2: Resync done (total 1 sec; paused 0 sec; 508 K/sec) block drbd2: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2 block drbd2: helper command: /sbin/drbdadm after-resync-target minor-2 exit code 0 (0x0) block drbd2: peer( Primary -> Secondary ) block drbd2: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) block drbd2: meta connection shut down by peer. block drbd2: asender terminated block drbd2: Terminating asender thread block drbd2: Connection closed block drbd2: conn( TearDown -> Unconnected ) block drbd2: receiver terminated block drbd2: Restarting receiver thread block drbd2: receiver (re)started block drbd2: conn( Unconnected -> WFConnection ) block drbd2: Handshake successful: Agreed network protocol version 91 block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC block drbd2: conn( WFConnection -> WFReportParams ) block drbd2: Starting asender thread (from drbd2_receiver [2766]) block drbd2: data-integrity-alg: <not-used> block drbd2: drbd_sync_handshake: block drbd2: self E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF bits:0 flags:0 block drbd2: peer E3BC1675D02BC8BC:0000000000000000:F05089B922FCC908:9CFD298D943949EF bits:0 flags:0 block drbd2: uuid_compare()=0 by rule 40 block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) block drbd2: peer( Secondary -> Primary ) my drbd.conf is: global { # minor-count 64; # dialog-refresh 5; # 5 seconds # disable-ip-verification; usage-count yes; } common { protocol C; syncer { rate 100M; #after "r2"; al-extents 257; } handlers { pri-on-incon-degr "echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/sbin/obliterate"; pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root; echo b > /proc/sysrq-trigger ; reboot -f"; split-brain "echo split-brain. drbdadm -- --discard-my-data connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert' root"; } startup { wfc-timeout 60; degr-wfc-timeout 60; # 1 minutes. #wait-after-sb; outdated-wfc-timeout 30; become-primary-on both; } disk { #on-io-error pass-on; fencing resource-and-stonith; # size 10G; } net { sndbuf-size 512k; timeout 60; # 6 seconds (unit = 0.1 seconds) connect-int 10; # 10 seconds (unit = 1 second) ping-int 10; # 10 seconds (unit = 1 second) ping-timeout 50; # 500 ms (unit = 0.1 seconds) max-buffers 2048; # unplug-watermark 128; max-epoch-size 2048; ko-count 10; allow-two-primaries; cram-hmac-alg "*****"; shared-secret "*****"; after-sb-0pri discard-least-changes; #after-sb-0pri discard-younger-primary; #after-sb-0pri discard-older-primary; after-sb-1pri violently-as0p; after-sb-2pri violently-as0p; rr-conflict call-pri-lost; # data-integrity-alg "crc32c"; } } resource r0 { device /dev/drbd0; disk /dev/hda4; meta-disk internal; on tweety-1 { address 10.254.254.253:7788; } on tweety-2 { address 10.254.254.254:7788; } } resource r1 { device /dev/drbd1; disk /dev/hdb4; meta-disk internal; on tweety-1 { address 10.254.254.253:7789; } on tweety-2 { address 10.254.254.254:7789; } } resource r2 { device /dev/drbd2; disk /dev/sda1; meta-disk internal; on tweety-1 { address 10.254.254.253:7790; } on tweety-2 { address 10.254.254.254:7790; } } 'drbdadm primary r2', promotes the resource to primary without problem. Am I doing something wrong? Thank you All for any help. Theophanis Kontogiannis -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20091115/2a0eb914/attachment.htm>