[DRBD-user] 8.3.5 Stalling on sync

James Larcombe jim at roadtech.co.uk
Tue Nov 24 14:22:11 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.



Hi List,

 

Please help. I have installed drbd 8.3.5 on Open Suse 11.1 (Kernel
2.6.27.29-0.1). 

 

I have run drbdadm create-md dbms-test on one node and create-md dbms-test2
on the other node. I then ran drbdadm up all on both nodes. I then ran
drbdadm -- --overwrite-data-of-my-peer primary dbms-test on the first node
and the same with dbms-test2 on the other node. They then run for a short
while before stalling. I have tried older version without success and
turning the sync rate down does not make any difference. Downing the
resources and bringing back up starts the sync again but this then stalls
quickly.

 

I have attached /proc/drbd, /etc/drbd.conf and a section from
/var/log/messages. Any pointers would be greatly appreciated.

 

version: 8.3.5 (api:88/proto:86-91)

GIT-hash: ded8cdf09b0efa1460e8ce7a72327c60ff2210fb build by root at hp-tm-40,
2009-11-24 12:21:46

 0: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r----

    ns:160896 nr:0 dw:0 dr:160896 al:0 bm:9 lo:1 pe:0 ua:0 ap:0 ep:1 wo:b
oos:926694296

        [>.] sync'ed:  0.1% (905040/905132)M      4972

        stalled

 1: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r----

    ns:0 nr:2173248 dw:2173248 dr:0 al:0 bm:132 lo:0 pe:29878 ua:0 ap:0 ep:1
wo:b oos:777971256

        [>.] sync'ed:  0.3% (759736/761856)M

        Stalled

 

 

 

Drbd.conf

 

global {

    # minor-count 64;

    # dialog-refresh 5; # 5 seconds

    # disable-ip-verification;

    usage-count no;

}

 

common {

 

handlers {

    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";

    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";

    local-io-error "echo o > /proc/sysrq-trigger ; halt -f";

    outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";

  }

 

 

  startup {

    degr-wfc-timeout 120;    # 2 minutes.

  }

 

  disk {

    on-io-error   detach;

    # fencing resource-only;

  }

 

  net {

   

    max-buffers     40000;

    unplug-watermark   40000;

    after-sb-0pri disconnect;

    after-sb-1pri disconnect;

    after-sb-2pri disconnect;

 

    rr-conflict disconnect;

 

  }

 

  syncer {

 

    rate 90M;

 

    al-extents 257;

 

    verify-alg crc32c;

    cpu-mask 1;

  }

 

}

 

resource dbms-test {

 

  protocol C;

 

 

  on hp-tm-40 {

    device     /dev/drbd0;

    disk      /dev/cciss/c0d1p4;

    address    192.168.95.53:7789;

    meta-disk  /dev/cciss/c0d1p1[0];

  }

 

  on hp-tm-41 {

    device    /dev/drbd0;

    disk      /dev/cciss/c0d1p4;

    address   192.168.95.54:7789;

    meta-disk /dev/cciss/c0d1p1[0];

  }

}

 

resource dbms-test2 {

 

  protocol C;

 

 

  on hp-tm-40 {

    device     /dev/drbd1;

    disk      /dev/cciss/c0d1p3;

    address   192.168.95.53:7788;

    meta-disk /dev/cciss/c0d1p2[0];

  }

 

  on hp-tm-41{

    device     /dev/drbd1;

    disk      /dev/cciss/c0d1p3;

    address   192.168.95.54:7788;

    meta-disk /dev/cciss/c0d1p2[0];

  }

}

 

 

Section from /var/log/messages

 

Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: peer( Secondary -> Unknown )
conn( SyncTarget -> TearDown ) pdsk( UpToDate -> DUnknown )

Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: asender terminated

Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: Terminating asender thread

Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: Connection closed

Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: conn( TearDown -> Unconnected
)

Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: receiver terminated

Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: Restarting receiver thread

Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: receiver (re)started

Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: conn( Unconnected ->
WFConnection )

Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: conn( WFConnection ->
Disconnecting )

Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Discarding network
configuration.

Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Connection closed

Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: conn( Disconnecting ->
StandAlone )

Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: receiver terminated

Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Terminating receiver thread

Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: disk( Inconsistent -> Diskless
)

Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: drbd_bm_resize called with
capacity == 0

Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: worker terminated

Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Terminating worker thread

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: peer( Secondary -> Unknown )
conn( SyncSource -> Disconnecting )

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: meta connection shut down by
peer.

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: asender terminated

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Terminating asender thread

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: drbd_pp_alloc interrupted!

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: alloc_ee: Allocation of a page
failed

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: error receiving RSDataRequest,
l: 24!

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Connection closed

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: conn( Disconnecting ->
StandAlone )

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: disk( UpToDate -> Diskless )
pdsk( Inconsistent -> DUnknown )

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: net_ee not empty, killed 5000
entries

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: receiver terminated

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Terminating receiver thread

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: drbd_bm_resize called with
capacity == 0

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: worker terminated

Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Terminating worker thread

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Starting worker thread (from
cqueue [86])

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: disk( Diskless -> Attaching )

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: No usable activity log found.

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Method to ensure write
ordering: barrier

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: max_segment_size ( = BIO size
) = 32768

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: drbd_bm_resize called with
capacity == 1887428655

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: resync bitmap: bits=235928582
words=3686385

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: size = 900 GB (943714327 KB)

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: recounting of set bits took
additional 6 jiffies

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: 884 GB (231676934 bits) marked
out-of-sync by on disk bit-map.

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: disk( Attaching ->
Inconsistent )

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Barriers not supported on meta
data device - disabling

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Starting worker thread (from
cqueue [86])

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: disk( Diskless -> Attaching )

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: No usable activity log found.

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Method to ensure write
ordering: barrier

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: max_segment_size ( = BIO size
) = 32768

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: drbd_bm_resize called with
capacity == 1887444720

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: resync bitmap: bits=235930590
words=3686416

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: size = 900 GB (943722360 KB)

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: recounting of set bits took
additional 6 jiffies

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: 742 GB (194495454 bits) marked
out-of-sync by on disk bit-map.

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: disk( Attaching -> UpToDate )
pdsk( DUnknown -> Outdated )

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Barriers not supported on meta
data device - disabling

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: conn( StandAlone ->
Unconnected )

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Starting receiver thread (from
drbd0_worker [6688])

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: receiver (re)started

Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: conn( Unconnected ->
WFConnection )

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: conn( StandAlone ->
Unconnected )

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Starting receiver thread (from
drbd1_worker [6695])

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: receiver (re)started

Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: conn( Unconnected ->
WFConnection )

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Handshake successful: Agreed
network protocol version 91

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: conn( WFConnection ->
WFReportParams )

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Starting asender thread (from
drbd0_receiver [6717])

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: data-integrity-alg: <not-used>

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: drbd_sync_handshake:

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: self
88E0ED22FECE2B68:0000000000000000:0000000000000000:0000000000000000
bits:231676934 flags:0

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: peer
5299E3A47E1A3F30:88E0ED22FECE2B69:8810A1CE27BB9808:27DB4B359F02FE48
bits:231676934 flags:0

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: uuid_compare()=-1 by rule 50

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Becoming sync target due to
disk states.

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )

Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: Handshake successful: Agreed
network protocol version 91

Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: conn( WFConnection ->
WFReportParams )

Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: Starting asender thread (from
drbd1_receiver [6721])

Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: data-integrity-alg: <not-used>

Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: drbd_sync_handshake:

Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: self
12DFCDD264D5E7AE:20C37C56C7437B76:441CA1FB5B900754:4A4B9D0203491EC4
bits:194495454 flags:0

Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: peer
20C37C56C7437B76:0000000000000000:0000000000000000:0000000000000000
bits:194495454 flags:0

Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: uuid_compare()=1 by rule 70

Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: Becoming sync source due to
disk states.

Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Inconsistent )

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID
)

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0 exit code 0 (0x0)

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: conn( WFSyncUUID -> SyncTarget
)

Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Began resync as SyncTarget
(will sync 926707736 KB [231676934 bits set]).

Nov 24 13:03:54 hp-tm-41 kernel: block drbd1: conn( WFBitMapS -> SyncSource
)

Nov 24 13:03:54 hp-tm-41 kernel: block drbd1: Began resync as SyncSource
(will sync 777981816 KB [194495454 bits set]).

 

Thanks

 

 

 

 



*************************************************************************
This e-mail is confidential and may be legally privileged. It is intended
solely for the use of the individual(s) to whom it is addressed. Any
content in this message is not necessarily a view or statement from Road
Tech Computer Systems Limited but is that of the individual sender. If
you are not the intended recipient, be advised that you have received
this e-mail in error and that any use, dissemination, forwarding,
printing, or copying of this e-mail is strictly prohibited. We use
reasonable endeavours to virus scan all e-mails leaving the company but
no warranty is given that this e-mail and any attachments are virus free.
You should undertake your own virus checking. The right to monitor e-mail
communications through our networks is reserved by us

  Road Tech Computer Systems Ltd. Shenley Hall, Rectory Lane, Shenley,
  Radlett, Hertfordshire, WD7 9AN. - VAT Registration No GB 449 3582 17
  Registered in England No: 02017435, Registered Address: Charter Court, 
  Midland Road, Hemel Hempstead,  Hertfordshire, HP2 5GE. 
*************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20091124/148678f9/attachment.htm>


More information about the drbd-user mailing list