Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi List, Please help. I have installed drbd 8.3.5 on Open Suse 11.1 (Kernel 2.6.27.29-0.1). I have run drbdadm create-md dbms-test on one node and create-md dbms-test2 on the other node. I then ran drbdadm up all on both nodes. I then ran drbdadm -- --overwrite-data-of-my-peer primary dbms-test on the first node and the same with dbms-test2 on the other node. They then run for a short while before stalling. I have tried older version without success and turning the sync rate down does not make any difference. Downing the resources and bringing back up starts the sync again but this then stalls quickly. I have attached /proc/drbd, /etc/drbd.conf and a section from /var/log/messages. Any pointers would be greatly appreciated. version: 8.3.5 (api:88/proto:86-91) GIT-hash: ded8cdf09b0efa1460e8ce7a72327c60ff2210fb build by root at hp-tm-40, 2009-11-24 12:21:46 0: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r---- ns:160896 nr:0 dw:0 dr:160896 al:0 bm:9 lo:1 pe:0 ua:0 ap:0 ep:1 wo:b oos:926694296 [>.] sync'ed: 0.1% (905040/905132)M 4972 stalled 1: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r---- ns:0 nr:2173248 dw:2173248 dr:0 al:0 bm:132 lo:0 pe:29878 ua:0 ap:0 ep:1 wo:b oos:777971256 [>.] sync'ed: 0.3% (759736/761856)M Stalled Drbd.conf global { # minor-count 64; # dialog-refresh 5; # 5 seconds # disable-ip-verification; usage-count no; } common { handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; # fencing resource-only; } net { max-buffers 40000; unplug-watermark 40000; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 90M; al-extents 257; verify-alg crc32c; cpu-mask 1; } } resource dbms-test { protocol C; on hp-tm-40 { device /dev/drbd0; disk /dev/cciss/c0d1p4; address 192.168.95.53:7789; meta-disk /dev/cciss/c0d1p1[0]; } on hp-tm-41 { device /dev/drbd0; disk /dev/cciss/c0d1p4; address 192.168.95.54:7789; meta-disk /dev/cciss/c0d1p1[0]; } } resource dbms-test2 { protocol C; on hp-tm-40 { device /dev/drbd1; disk /dev/cciss/c0d1p3; address 192.168.95.53:7788; meta-disk /dev/cciss/c0d1p2[0]; } on hp-tm-41{ device /dev/drbd1; disk /dev/cciss/c0d1p3; address 192.168.95.54:7788; meta-disk /dev/cciss/c0d1p2[0]; } } Section from /var/log/messages Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: peer( Secondary -> Unknown ) conn( SyncTarget -> TearDown ) pdsk( UpToDate -> DUnknown ) Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: asender terminated Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: Terminating asender thread Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: Connection closed Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: conn( TearDown -> Unconnected ) Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: receiver terminated Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: Restarting receiver thread Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: receiver (re)started Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: conn( Unconnected -> WFConnection ) Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: conn( WFConnection -> Disconnecting ) Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Discarding network configuration. Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Connection closed Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: conn( Disconnecting -> StandAlone ) Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: receiver terminated Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Terminating receiver thread Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: disk( Inconsistent -> Diskless ) Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: drbd_bm_resize called with capacity == 0 Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: worker terminated Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Terminating worker thread Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: peer( Secondary -> Unknown ) conn( SyncSource -> Disconnecting ) Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: meta connection shut down by peer. Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: asender terminated Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Terminating asender thread Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: drbd_pp_alloc interrupted! Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: alloc_ee: Allocation of a page failed Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: error receiving RSDataRequest, l: 24! Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Connection closed Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: conn( Disconnecting -> StandAlone ) Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: disk( UpToDate -> Diskless ) pdsk( Inconsistent -> DUnknown ) Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: net_ee not empty, killed 5000 entries Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: receiver terminated Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Terminating receiver thread Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: drbd_bm_resize called with capacity == 0 Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: worker terminated Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Terminating worker thread Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Starting worker thread (from cqueue [86]) Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: disk( Diskless -> Attaching ) Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: No usable activity log found. Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Method to ensure write ordering: barrier Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: max_segment_size ( = BIO size ) = 32768 Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: drbd_bm_resize called with capacity == 1887428655 Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: resync bitmap: bits=235928582 words=3686385 Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: size = 900 GB (943714327 KB) Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: recounting of set bits took additional 6 jiffies Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: 884 GB (231676934 bits) marked out-of-sync by on disk bit-map. Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: disk( Attaching -> Inconsistent ) Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Barriers not supported on meta data device - disabling Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Starting worker thread (from cqueue [86]) Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: disk( Diskless -> Attaching ) Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: No usable activity log found. Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Method to ensure write ordering: barrier Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: max_segment_size ( = BIO size ) = 32768 Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: drbd_bm_resize called with capacity == 1887444720 Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: resync bitmap: bits=235930590 words=3686416 Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: size = 900 GB (943722360 KB) Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: recounting of set bits took additional 6 jiffies Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: 742 GB (194495454 bits) marked out-of-sync by on disk bit-map. Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: disk( Attaching -> UpToDate ) pdsk( DUnknown -> Outdated ) Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Barriers not supported on meta data device - disabling Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: conn( StandAlone -> Unconnected ) Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Starting receiver thread (from drbd0_worker [6688]) Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: receiver (re)started Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: conn( Unconnected -> WFConnection ) Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: conn( StandAlone -> Unconnected ) Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Starting receiver thread (from drbd1_worker [6695]) Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: receiver (re)started Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: conn( Unconnected -> WFConnection ) Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Handshake successful: Agreed network protocol version 91 Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Starting asender thread (from drbd0_receiver [6717]) Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: data-integrity-alg: <not-used> Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: drbd_sync_handshake: Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: self 88E0ED22FECE2B68:0000000000000000:0000000000000000:0000000000000000 bits:231676934 flags:0 Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: peer 5299E3A47E1A3F30:88E0ED22FECE2B69:8810A1CE27BB9808:27DB4B359F02FE48 bits:231676934 flags:0 Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: uuid_compare()=-1 by rule 50 Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Becoming sync target due to disk states. Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: Handshake successful: Agreed network protocol version 91 Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: conn( WFConnection -> WFReportParams ) Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: Starting asender thread (from drbd1_receiver [6721]) Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: data-integrity-alg: <not-used> Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: drbd_sync_handshake: Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: self 12DFCDD264D5E7AE:20C37C56C7437B76:441CA1FB5B900754:4A4B9D0203491EC4 bits:194495454 flags:0 Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: peer 20C37C56C7437B76:0000000000000000:0000000000000000:0000000000000000 bits:194495454 flags:0 Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: uuid_compare()=1 by rule 70 Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: Becoming sync source due to disk states. Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Inconsistent ) Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID ) Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: conn( WFSyncUUID -> SyncTarget ) Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Began resync as SyncTarget (will sync 926707736 KB [231676934 bits set]). Nov 24 13:03:54 hp-tm-41 kernel: block drbd1: conn( WFBitMapS -> SyncSource ) Nov 24 13:03:54 hp-tm-41 kernel: block drbd1: Began resync as SyncSource (will sync 777981816 KB [194495454 bits set]). Thanks ************************************************************************* This e-mail is confidential and may be legally privileged. It is intended solely for the use of the individual(s) to whom it is addressed. Any content in this message is not necessarily a view or statement from Road Tech Computer Systems Limited but is that of the individual sender. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. We use reasonable endeavours to virus scan all e-mails leaving the company but no warranty is given that this e-mail and any attachments are virus free. You should undertake your own virus checking. The right to monitor e-mail communications through our networks is reserved by us Road Tech Computer Systems Ltd. Shenley Hall, Rectory Lane, Shenley, Radlett, Hertfordshire, WD7 9AN. - VAT Registration No GB 449 3582 17 Registered in England No: 02017435, Registered Address: Charter Court, Midland Road, Hemel Hempstead, Hertfordshire, HP2 5GE. ************************************************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20091124/148678f9/attachment.htm>