Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi List,
Please help. I have installed drbd 8.3.5 on Open Suse 11.1 (Kernel
2.6.27.29-0.1).
I have run drbdadm create-md dbms-test on one node and create-md dbms-test2
on the other node. I then ran drbdadm up all on both nodes. I then ran
drbdadm -- --overwrite-data-of-my-peer primary dbms-test on the first node
and the same with dbms-test2 on the other node. They then run for a short
while before stalling. I have tried older version without success and
turning the sync rate down does not make any difference. Downing the
resources and bringing back up starts the sync again but this then stalls
quickly.
I have attached /proc/drbd, /etc/drbd.conf and a section from
/var/log/messages. Any pointers would be greatly appreciated.
version: 8.3.5 (api:88/proto:86-91)
GIT-hash: ded8cdf09b0efa1460e8ce7a72327c60ff2210fb build by root at hp-tm-40,
2009-11-24 12:21:46
0: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r----
ns:160896 nr:0 dw:0 dr:160896 al:0 bm:9 lo:1 pe:0 ua:0 ap:0 ep:1 wo:b
oos:926694296
[>.] sync'ed: 0.1% (905040/905132)M 4972
stalled
1: cs:SyncTarget ro:Secondary/Secondary ds:Inconsistent/UpToDate C r----
ns:0 nr:2173248 dw:2173248 dr:0 al:0 bm:132 lo:0 pe:29878 ua:0 ap:0 ep:1
wo:b oos:777971256
[>.] sync'ed: 0.3% (759736/761856)M
Stalled
Drbd.conf
global {
# minor-count 64;
# dialog-refresh 5; # 5 seconds
# disable-ip-verification;
usage-count no;
}
common {
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}
startup {
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error detach;
# fencing resource-only;
}
net {
max-buffers 40000;
unplug-watermark 40000;
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 90M;
al-extents 257;
verify-alg crc32c;
cpu-mask 1;
}
}
resource dbms-test {
protocol C;
on hp-tm-40 {
device /dev/drbd0;
disk /dev/cciss/c0d1p4;
address 192.168.95.53:7789;
meta-disk /dev/cciss/c0d1p1[0];
}
on hp-tm-41 {
device /dev/drbd0;
disk /dev/cciss/c0d1p4;
address 192.168.95.54:7789;
meta-disk /dev/cciss/c0d1p1[0];
}
}
resource dbms-test2 {
protocol C;
on hp-tm-40 {
device /dev/drbd1;
disk /dev/cciss/c0d1p3;
address 192.168.95.53:7788;
meta-disk /dev/cciss/c0d1p2[0];
}
on hp-tm-41{
device /dev/drbd1;
disk /dev/cciss/c0d1p3;
address 192.168.95.54:7788;
meta-disk /dev/cciss/c0d1p2[0];
}
}
Section from /var/log/messages
Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: peer( Secondary -> Unknown )
conn( SyncTarget -> TearDown ) pdsk( UpToDate -> DUnknown )
Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: asender terminated
Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: Terminating asender thread
Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: Connection closed
Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: conn( TearDown -> Unconnected
)
Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: receiver terminated
Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: Restarting receiver thread
Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: receiver (re)started
Nov 24 13:03:43 hp-tm-41 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: conn( WFConnection ->
Disconnecting )
Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Discarding network
configuration.
Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Connection closed
Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: conn( Disconnecting ->
StandAlone )
Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: receiver terminated
Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Terminating receiver thread
Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: disk( Inconsistent -> Diskless
)
Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: drbd_bm_resize called with
capacity == 0
Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: worker terminated
Nov 24 13:03:46 hp-tm-41 kernel: block drbd0: Terminating worker thread
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: peer( Secondary -> Unknown )
conn( SyncSource -> Disconnecting )
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: meta connection shut down by
peer.
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: asender terminated
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Terminating asender thread
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: drbd_pp_alloc interrupted!
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: alloc_ee: Allocation of a page
failed
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: error receiving RSDataRequest,
l: 24!
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Connection closed
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: conn( Disconnecting ->
StandAlone )
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: disk( UpToDate -> Diskless )
pdsk( Inconsistent -> DUnknown )
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: net_ee not empty, killed 5000
entries
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: receiver terminated
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Terminating receiver thread
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: drbd_bm_resize called with
capacity == 0
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: worker terminated
Nov 24 13:03:46 hp-tm-41 kernel: block drbd1: Terminating worker thread
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Starting worker thread (from
cqueue [86])
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: disk( Diskless -> Attaching )
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: No usable activity log found.
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Method to ensure write
ordering: barrier
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: max_segment_size ( = BIO size
) = 32768
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: drbd_bm_resize called with
capacity == 1887428655
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: resync bitmap: bits=235928582
words=3686385
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: size = 900 GB (943714327 KB)
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: recounting of set bits took
additional 6 jiffies
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: 884 GB (231676934 bits) marked
out-of-sync by on disk bit-map.
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: disk( Attaching ->
Inconsistent )
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Barriers not supported on meta
data device - disabling
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Starting worker thread (from
cqueue [86])
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: disk( Diskless -> Attaching )
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: No usable activity log found.
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Method to ensure write
ordering: barrier
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: max_segment_size ( = BIO size
) = 32768
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: drbd_bm_resize called with
capacity == 1887444720
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: resync bitmap: bits=235930590
words=3686416
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: size = 900 GB (943722360 KB)
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: recounting of set bits took
additional 6 jiffies
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: 742 GB (194495454 bits) marked
out-of-sync by on disk bit-map.
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: disk( Attaching -> UpToDate )
pdsk( DUnknown -> Outdated )
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Barriers not supported on meta
data device - disabling
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: conn( StandAlone ->
Unconnected )
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: Starting receiver thread (from
drbd0_worker [6688])
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: receiver (re)started
Nov 24 13:03:50 hp-tm-41 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: conn( StandAlone ->
Unconnected )
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: Starting receiver thread (from
drbd1_worker [6695])
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: receiver (re)started
Nov 24 13:03:50 hp-tm-41 kernel: block drbd1: conn( Unconnected ->
WFConnection )
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Handshake successful: Agreed
network protocol version 91
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Starting asender thread (from
drbd0_receiver [6717])
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: data-integrity-alg: <not-used>
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: drbd_sync_handshake:
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: self
88E0ED22FECE2B68:0000000000000000:0000000000000000:0000000000000000
bits:231676934 flags:0
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: peer
5299E3A47E1A3F30:88E0ED22FECE2B69:8810A1CE27BB9808:27DB4B359F02FE48
bits:231676934 flags:0
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: uuid_compare()=-1 by rule 50
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Becoming sync target due to
disk states.
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: Handshake successful: Agreed
network protocol version 91
Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: conn( WFConnection ->
WFReportParams )
Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: Starting asender thread (from
drbd1_receiver [6721])
Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: data-integrity-alg: <not-used>
Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: drbd_sync_handshake:
Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: self
12DFCDD264D5E7AE:20C37C56C7437B76:441CA1FB5B900754:4A4B9D0203491EC4
bits:194495454 flags:0
Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: peer
20C37C56C7437B76:0000000000000000:0000000000000000:0000000000000000
bits:194495454 flags:0
Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: uuid_compare()=1 by rule 70
Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: Becoming sync source due to
disk states.
Nov 24 13:03:53 hp-tm-41 kernel: block drbd1: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Inconsistent )
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID
)
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0 exit code 0 (0x0)
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: conn( WFSyncUUID -> SyncTarget
)
Nov 24 13:03:53 hp-tm-41 kernel: block drbd0: Began resync as SyncTarget
(will sync 926707736 KB [231676934 bits set]).
Nov 24 13:03:54 hp-tm-41 kernel: block drbd1: conn( WFBitMapS -> SyncSource
)
Nov 24 13:03:54 hp-tm-41 kernel: block drbd1: Began resync as SyncSource
(will sync 777981816 KB [194495454 bits set]).
Thanks
*************************************************************************
This e-mail is confidential and may be legally privileged. It is intended
solely for the use of the individual(s) to whom it is addressed. Any
content in this message is not necessarily a view or statement from Road
Tech Computer Systems Limited but is that of the individual sender. If
you are not the intended recipient, be advised that you have received
this e-mail in error and that any use, dissemination, forwarding,
printing, or copying of this e-mail is strictly prohibited. We use
reasonable endeavours to virus scan all e-mails leaving the company but
no warranty is given that this e-mail and any attachments are virus free.
You should undertake your own virus checking. The right to monitor e-mail
communications through our networks is reserved by us
Road Tech Computer Systems Ltd. Shenley Hall, Rectory Lane, Shenley,
Radlett, Hertfordshire, WD7 9AN. - VAT Registration No GB 449 3582 17
Registered in England No: 02017435, Registered Address: Charter Court,
Midland Road, Hemel Hempstead, Hertfordshire, HP2 5GE.
*************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20091124/148678f9/attachment.htm>