[DRBD-user] info about messages in drbd stop/start in dual primary config

Gianluca Cecchi gianluca.cecchi at gmail.com
Tue Jul 28 17:36:32 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


hello,
I'm in a dual primary config with 8.3.2 on f11 x86_64, kernel
2.6.29.5-191.fc11.x86_64.
Behaviour seems ok, but I have doubts related to the log lines below
marked with >>>
output of "drbdadm dump" is at the end.

Also one other question: from drbd point of view, should these two
operations be the same and so managed the same way?

1)
node1 and node2 active in dual primary
service drbdadm stop on node2
wait some seconds
service drbdadm start on node2

2)
node1 and node2 active in dual primary
drbd configured to start and stop in init scripts
shutdown -r on node 2

Or are there any differences or latencies when I do a shutdown (so in
practice a stop/start of drbd but with greater delay and with network
interface of node2 going down...)

Thanks in advance,
Gianluca

starting point is:
node1
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
root at virtfed.domain.com, 2009-07-16 11:03:37
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
    ns:55143748 nr:211304 dw:964992 dr:56018052 al:548 bm:3711 lo:0
pe:0 ua:0 ap:0 ep:1 wo:d oos:0

node2
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
root at virtfed.domain.com, 2009-07-16 11:03:37
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:10088 dw:10088 dr:92 al:0 bm:18 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

On node2 (using init script as provided by installation form source):
[root at virtfedbis ~]# service drbd stop
Stopping all DRBD resources: .
[root at virtfedbis ~]# echo $?
0

Now on node1:
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
root at virtfed.domain.com, 2009-07-16 11:03:37
 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/Outdated C r----
    ns:55144420 nr:211304 dw:966164 dr:56018052 al:548 bm:3711 lo:0
pe:0 ua:0 ap:0 ep:1 wo:d oos:436

logs:
node1:
Jul 27 15:15:41 virtfed kernel: block drbd0: peer( Primary -> Secondary )
>>> Jul 27 15:15:41 virtfed kernel: block drbd0: State change failed: Refusing to be Primary while peer is not outdated
Jul 27 15:15:41 virtfed kernel: block drbd0:   state = { cs:Connected
ro:Primary/Secondary ds:UpToDate/UpToDate r--- }
Jul 27 15:15:41 virtfed kernel: block drbd0:  wanted = { cs:TearDown
ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jul 27 15:15:41 virtfed kernel: block drbd0: peer( Secondary ->
Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> Outdated )
Jul 27 15:15:41 virtfed kernel: block drbd0: Creating new current UUID
Jul 27 15:15:41 virtfed kernel: block drbd0: asender terminated
Jul 27 15:15:41 virtfed kernel: block drbd0: Terminating asender thread
Jul 27 15:15:41 virtfed kernel: block drbd0: Connection closed
Jul 27 15:15:41 virtfed kernel: block drbd0: conn( TearDown -> Unconnected )
Jul 27 15:15:41 virtfed kernel: block drbd0: receiver terminated
Jul 27 15:15:41 virtfed kernel: block drbd0: Restarting receiver thread
Jul 27 15:15:41 virtfed kernel: block drbd0: receiver (re)started
Jul 27 15:15:41 virtfed kernel: block drbd0: conn( Unconnected ->
WFConnection )

logs node2:
Jul 27 15:15:41 virtfedbis kernel: block drbd0: role( Primary -> Secondary )
>>> Jul 27 15:15:41 virtfedbis kernel: block drbd0: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7)
Jul 27 15:15:41 virtfedbis kernel: block drbd0: meta connection shut
down by peer.
Jul 27 15:15:41 virtfedbis kernel: block drbd0: peer( Primary ->
Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )
Jul 27 15:15:41 virtfedbis kernel: block drbd0: asender terminated
Jul 27 15:15:41 virtfedbis kernel: block drbd0: Terminating asender thread
Jul 27 15:15:41 virtfedbis kernel: block drbd0: conn( NetworkFailure
-> Disconnecting ) disk( UpToDate -> Outdated )
Jul 27 15:15:41 virtfedbis kernel: block drbd0: Connection closed
Jul 27 15:15:41 virtfedbis kernel: block drbd0: conn( Disconnecting ->
StandAlone )
Jul 27 15:15:41 virtfedbis kernel: block drbd0: receiver terminated
Jul 27 15:15:41 virtfedbis kernel: block drbd0: Terminating receiver thread
Jul 27 15:15:41 virtfedbis kernel: block drbd0: disk( Outdated -> Diskless )
Jul 27 15:15:41 virtfedbis kernel: block drbd0: drbd_bm_resize called
with capacity == 0
Jul 27 15:15:41 virtfedbis kernel: block drbd0: worker terminated
Jul 27 15:15:41 virtfedbis kernel: block drbd0: Terminating worker thread
Jul 27 15:15:41 virtfedbis kernel: drbd: module cleanup done.
Jul 27 15:15:41 virtfedbis kernel: drbd: initialised. Version: 8.3.2
(api:88/proto:86-90)
Jul 27 15:15:41 virtfedbis kernel: drbd: GIT-hash:
dd7985327f146f33b86d4bff5ca8c94234ce840e build by
root at virtfed.domain.com, 2009-07-16 11:03:37
Jul 27 15:15:41 virtfedbis kernel: drbd: registered as block device major 147
Jul 27 15:15:41 virtfedbis kernel: drbd: minor_table @ 0xffff8808293d1200

Then restart of drbd on node2:
[root at virtfedbis ~]# service drbd start
Starting DRBD resources: [ d(r0) s(r0) n(r0) ].

now situation is
node1
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
root at virtfed.domain.com, 2009-07-16 11:03:37
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
    ns:55146520 nr:211304 dw:968180 dr:56019964 al:548 bm:3720 lo:0
pe:0 ua:0 ap:0 ep:1 wo:d oos:0

node2
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
root at virtfed.domain.com, 2009-07-16 11:03:37
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:3764 dw:3764 dr:92 al:0 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

logs node1:
Jul 27 15:17:07 virtfed kernel: block drbd0: Handshake successful:
Agreed network protocol version 90
Jul 27 15:17:07 virtfed kernel: block drbd0: Peer authenticated using
20 bytes of 'sha1' HMAC
Jul 27 15:17:07 virtfed kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Jul 27 15:17:07 virtfed kernel: block drbd0: Starting asender thread
(from drbd0_receiver [3195])
Jul 27 15:17:07 virtfed kernel: block drbd0: data-integrity-alg: <not-used>
Jul 27 15:17:07 virtfed kernel: block drbd0: drbd_sync_handshake:
Jul 27 15:17:07 virtfed kernel: block drbd0: self
A54E721233CDE951:484E7F140D5A9EC5:8AF49A12F8186DD2:00C81C83671779C1
bits:478 flags:0
Jul 27 15:17:07 virtfed kernel: block drbd0: peer
484E7F140D5A9EC4:0000000000000000:8AF49A12F8186DD3:00C81C83671779C1
bits:0 flags:0
Jul 27 15:17:07 virtfed kernel: block drbd0: uuid_compare()=1 by rule 7
Jul 27 15:17:07 virtfed kernel: block drbd0: peer( Unknown ->
Secondary ) conn( WFReportParams -> WFBitMapS )
Jul 27 15:17:07 virtfed kernel: block drbd0: peer( Secondary -> Primary )
Jul 27 15:17:07 virtfed kernel: block drbd0: conn( WFBitMapS ->
SyncSource ) pdsk( Outdated -> Inconsistent )
Jul 27 15:17:07 virtfed kernel: block drbd0: Began resync as
SyncSource (will sync 1912 KB [478 bits set]).
Jul 27 15:17:07 virtfed kernel: block drbd0: Resync done (total 1 sec;
paused 0 sec; 1912 K/sec)
Jul 27 15:17:07 virtfed kernel: block drbd0: conn( SyncSource ->
Connected ) pdsk( Inconsistent -> UpToDate )

logs node2:
Jul 27 15:17:07 virtfedbis kernel: block drbd0: Starting worker thread
(from cqueue [64])
Jul 27 15:17:07 virtfedbis kernel: block drbd0: disk( Diskless -> Attaching )
Jul 27 15:17:07 virtfedbis kernel: block drbd0: Found 6 transactions
(64 active extents) in activity log.
Jul 27 15:17:07 virtfedbis kernel: block drbd0: Method to ensure write
ordering: barrier
Jul 27 15:17:07 virtfedbis kernel: block drbd0: max_segment_size ( =
BIO size ) = 32768
Jul 27 15:17:07 virtfedbis kernel: block drbd0: drbd_bm_resize called
with capacity == 109317376
Jul 27 15:17:07 virtfedbis kernel: block drbd0: resync bitmap:
bits=13664672 words=213511
Jul 27 15:17:07 virtfedbis kernel: block drbd0: size = 52 GB (54658688 KB)
Jul 27 15:17:07 virtfedbis kernel: block drbd0: recounting of set bits
took additional 1 jiffies
Jul 27 15:17:07 virtfedbis kernel: block drbd0: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
Jul 27 15:17:07 virtfedbis kernel: block drbd0: disk( Attaching -> Outdated )
Jul 27 15:17:07 virtfedbis kernel: block drbd0: Barriers not supported
on meta data device - disabling
Jul 27 15:17:07 virtfedbis kernel: block drbd0: conn( StandAlone ->
Unconnected )
Jul 27 15:17:07 virtfedbis kernel: block drbd0: Starting receiver
thread (from drbd0_worker [9487])
Jul 27 15:17:07 virtfedbis kernel: block drbd0: receiver (re)started
Jul 27 15:17:07 virtfedbis kernel: block drbd0: conn( Unconnected ->
WFConnection )
Jul 27 15:17:07 virtfedbis kernel: block drbd0: Handshake successful:
Agreed network protocol version 90
Jul 27 15:17:07 virtfedbis kernel: block drbd0: Peer authenticated
using 20 bytes of 'sha1' HMAC
Jul 27 15:17:07 virtfedbis kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Jul 27 15:17:07 virtfedbis kernel: block drbd0: Starting asender
thread (from drbd0_receiver [9506])
Jul 27 15:17:07 virtfedbis kernel: block drbd0: data-integrity-alg: <not-used>
Jul 27 15:17:07 virtfedbis kernel: block drbd0: drbd_sync_handshake:
Jul 27 15:17:07 virtfedbis kernel: block drbd0: self
484E7F140D5A9EC4:0000000000000000:8AF49A12F8186DD3:00C81C83671779C1
bits:0 flags:0
Jul 27 15:17:07 virtfedbis kernel: block drbd0: peer
A54E721233CDE951:484E7F140D5A9EC5:8AF49A12F8186DD2:00C81C83671779C1
bits:478 flags:0
Jul 27 15:17:07 virtfedbis kernel: block drbd0: uuid_compare()=-1 by rule 5
Jul 27 15:17:07 virtfedbis kernel: block drbd0: peer( Unknown ->
Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown ->
UpToDate )
Jul 27 15:17:07 virtfedbis kernel: block drbd0: role( Secondary -> Primary )
Jul 27 15:17:07 virtfedbis kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID )
Jul 27 15:17:07 virtfedbis kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-target minor-0
Jul 27 15:17:07 virtfedbis kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Jul 27 15:17:07 virtfedbis kernel: block drbd0: conn( WFSyncUUID ->
SyncTarget ) disk( Outdated -> Inconsistent )
Jul 27 15:17:07 virtfedbis kernel: block drbd0: Began resync as
SyncTarget (will sync 1912 KB [478 bits set]).
Jul 27 15:17:07 virtfedbis kernel: block drbd0: Resync done (total 1
sec; paused 0 sec; 1912 K/sec)
Jul 27 15:17:07 virtfedbis kernel: block drbd0: conn( SyncTarget ->
Connected ) disk( Inconsistent -> UpToDate )
Jul 27 15:17:07 virtfedbis kernel: block drbd0: helper command:
/sbin/drbdadm after-resync-target minor-0
Jul 27 15:17:07 virtfedbis kernel: block drbd0: helper command:
/sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
Jul 27 15:17:09 virtfedbis kernel: block drbd0: Method to ensure write
ordering: flush
>>> Jul 27 15:17:09 virtfedbis kernel: end_request: I/O error, dev cciss/c0d0, sector 0
>>> Jul 27 15:17:09 virtfedbis kernel: block drbd0: local disk flush failed with status -95
>>> Jul 27 15:17:09 virtfedbis kernel: block drbd0: Method to ensure write ordering: drain


[root at virtfed ~]# drbdadm dump
# /etc/drbd.conf
common {
    syncer {
        rate             100M;
    }
}

# resource r0 on virtfed.domain.com: not ignored, not stacked
resource r0 {
    protocol               C;
    on virtfed.domain.com {
        device           /dev/drbd0 minor 0;
        disk             /dev/cciss/c0d0p3;
        address          ipv4 192.168.16.111:7788;
        flexible-meta-disk /dev/cciss/c0d0p4;
    }
    on virtfedbis.domain.com {
        device           /dev/drbd0 minor 0;
        disk             /dev/cciss/c0d0p3;
        address          ipv4 192.168.16.112:7788;
        flexible-meta-disk /dev/cciss/c0d0p4;
    }
    net {
        allow-two-primaries;
        cram-hmac-alg    sha1;
        shared-secret    kvmdrbd;
        after-sb-0pri    discard-zero-changes;
        after-sb-1pri    discard-secondary;
        after-sb-2pri    disconnect;
        rr-conflict      disconnect;
    }
    disk {
        on-io-error      detach;
        fencing          resource-and-stonith;
    }
    syncer {
        al-extents       257;
    }
    startup {
        wfc-timeout       30;
        degr-wfc-timeout  10;
        outdated-wfc-timeout   2;
        become-primary-on both;
    }
    handlers {
        pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger
; reboot -f";
        pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger
; reboot -f";
        local-io-error   "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
/proc/sysrq-trigger ; halt -f";
        fence-peer       /usr/local/bin/obliterate-peer.sh;
    }
}



More information about the drbd-user mailing list