Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi there!
I've been reading thru the ML for a while now and can't find anything
matching my problem, so i hope someone can help this way.
i'm using 2 "standard" hosts running 2.4.26 with the vserver ("ctx") patch
1.27.
one has an intel cpu, the other amd, but despite that, their configuration
is very similar:
.) 1 nic to the outside
.) 1 nic internal (10.99.99.1 + .2) w/ crossover cable
.) crossover serial cable connected to ttyS0 on both
.) 2 hdds (hda+hdc) with the following ptbl:
Disk /dev/hda: 120.0 GB, 120060444672 bytes
255 heads, 63 sectors/track, 14596 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hda1 1 973 7815591 fd Linux raid autodetect
/dev/hda2 974 1459 3903795 fd Linux raid autodetect
/dev/hda3 1460 1708 2000092+ 82 Linux swap
/dev/hda4 1709 14500 102751740 fd Linux raid autodetect
the partitions are mounted as follows:
/dev/md1 on / type ext3 (rw)
/dev/md2 on /var type ext3 (rw)
/dev/md4 on /vservers type ext3 (rw) (yes i called it md4.)
both systems are running debian sarge. i now wanted to make md4 "a drbd
device" and used the following drbd.conf:
----------
global {
minor_count=1
# disable_io_hints
}
resource vs {
protocol = C
fsckcmd = /bin/true
# inittimeout=60
# skip-wait
# load-only
# incon-degr-cmd=halt -f
disk {
do-panic
disk-size = 51375808k
}
net {
sndbuf-size = 512k
skip-sync
# sync-group = 1
sync-min = 10M # syncer tries hard to not drop below this rate
sync-max = 25M # if you don't care about network saturation
tl-size = 5000 # transfer log size, ensures strict write ordering
timeout = 60 # unit: 0.1 seconds
connect-int = 10 # unit: seconds
ping-int = 10 # unit: seconds
ko-count = 10 # if some block send times out this many times,
# the peer is considered dead, even if it still
# answeres ping requests
}
on cthon {
device = /dev/nb0
disk = /dev/md4
address = 10.99.99.1
port = 7788
}
on nightmare {
device = /dev/nb0
disk = /dev/md4
address = 10.99.99.2
port = 7788
}
}
----------
i was first using "drbd-0.6.12.tar.gz" and then tried checking out the
current stable cvs, but no difference. my problem is that, when i run
/etc/init.d/drbd start on both boxes, everything first seems to work as it
should:
primary (cthon):
version: 0.6.12 (api:64/proto:62)
0: cs:Connected st:Primary/Secondary ns:0 nr:0 dw:0 dr:0 pe:0 ua:0
NEEDS_SYNC
secondary (nightmare):
version: 0.6.12 (api:64/proto:62)
0: cs:WFConnection st:Secondary/Unknown ns:0 nr:0 dw:0 dr:0 pe:0 ua:0
NEEDS_SYNC INCONSISTENT
then i start the resync process with:
cthon:~# drbdsetup /dev/nb0 replicate
it does work, stating
0: cs:SyncingAll st:Primary/Secondary ns:116020 nr:0 dw:0 dr:116224
pe:20469 ua:0
[>...................] sync'ed: 0.3% (50057/50171)M
finish: 4:02:59h speed: 3,529 (3,529) K/sec
BUT in the meantime the syncer crashed on the primary.. according to syslog:
Jun 25 14:10:33 cthon drbd: ===> drbd start <===
Jun 25 14:10:33 cthon drbd: modprobe -s drbd minor_count=1
Jun 25 14:10:33 cthon kernel: drbd: initialised. Version: 0.6.12
(api:64/proto:62)
Jun 25 14:10:33 cthon drbd: drbdsetup /dev/nb0 disk /dev/md4 --do-panic
--disk-size=51375808k
Jun 25 14:10:33 cthon drbd: drbdsetup /dev/nb0 net 10.99.99.1:7788
10.99.99.2:7788 C --sndbuf-size=512k --skip-sync --sync-min=10M -
-sync-max=25M --tl-size=5000 --timeout=60 --connect-int=10 --ping-int=10
--ko-count=10
Jun 25 14:10:33 cthon drbd: drbdsetup /dev/nb0 wait_connect -t 0
Jun 25 14:10:33 cthon kernel: drbd0: Connection established. size=51375808
KB / blksize=4096 B
Jun 25 14:10:33 cthon kernel: klogd 1.4.1, ---------- state change ----------
Jun 25 14:10:33 cthon kernel: Loaded 60 symbols from 1 module.
Jun 25 14:11:07 cthon kernel: drbd0: FULL Synchronisation started blks=64
Jun 25 14:11:10 cthon kernel: general protection fault: 0000
Jun 25 14:11:10 cthon kernel: CPU: 0
Jun 25 14:11:10 cthon kernel: EIP: 0010:[<de92d773>] Not tainted
Jun 25 14:11:10 cthon kernel: EFLAGS: 00010286
Jun 25 14:11:10 cthon kernel: eax: ffffffff ebx: 00000000 ecx:
00000001 edx: ffffffff
Jun 25 14:11:10 cthon kernel: esi: ffffffff edi: dbd33f9c ebp:
0000f200 esp: dbd33f44
Jun 25 14:11:10 cthon kernel: ds: 0018 es: 0018 ss: 0018
Jun 25 14:11:10 cthon kernel: Process drbdd_0 (pid: 638, stackpage=dbd33000)
Jun 25 14:11:10 cthon kernel: Stack: dbd33f14 00000001 00000000 00000000
00004100 00000286 00000000 dcc88000
Jun 25 14:11:10 cthon kernel: dbd33f9c 00000000 de92aff9 ffffffff
0000f200 00000001 00000000 00000000
Jun 25 14:11:10 cthon kernel: 00000000 0000214a 00000000 dcc88000
dbf5c5a0 00000002 67027483 00000300
Jun 25 14:11:10 cthon kernel: Call Trace: [<de92aff9>] [<de92b45a>]
[<de92f4db>] [<de926a8d>] [<c010738e>]
Jun 25 14:11:10 cthon kernel: [<de926a60>]
Jun 25 14:11:10 cthon kernel:
Jun 25 14:11:10 cthon kernel: Code: 8b 06 0f b6 58 14 a1 e4 fb 92 de 69 db
dc 02 00 00 01 c3 31
Jun 25 14:11:21 cthon kernel: <3>drbd0: [drbd_syncer_0/685] sock_sendmsg
timeout count down: ko=9
Jun 25 14:11:24 cthon kernel: drbd0: ping ack did not arrive, trying to
reconnect
Jun 25 14:11:27 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg
timeout count down: ko=8
Jun 25 14:11:33 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg
timeout count down: ko=7
Jun 25 14:11:39 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg
timeout count down: ko=6
Jun 25 14:11:45 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg
timeout count down: ko=5
Jun 25 14:11:51 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg
timeout count down: ko=4
Jun 25 14:11:57 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg
timeout count down: ko=3
Jun 25 14:12:03 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg
timeout count down: ko=2
Jun 25 14:12:09 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg
timeout count down: ko=1
Jun 25 14:12:14 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg
returned -104
Jun 25 14:12:14 cthon kernel: drbd0: Syncer send failed.
this definitely doesn't look right to me =)
i noticed people were talking about the blksize, but how can i change that
one? am i overlooking something else? any help would be appreciated..
tia!
--
Markus Rambossek
MXR66-RIPE
mxr at mos.at <mailto:mxr at mos.at>
carrier66.net NetWork
DataCenter Wien
Shuttleworthstrasse 4-8
1210 Wien
Austria
Mobil: +43 650 4126691