Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi there! I've been reading thru the ML for a while now and can't find anything matching my problem, so i hope someone can help this way. i'm using 2 "standard" hosts running 2.4.26 with the vserver ("ctx") patch 1.27. one has an intel cpu, the other amd, but despite that, their configuration is very similar: .) 1 nic to the outside .) 1 nic internal (10.99.99.1 + .2) w/ crossover cable .) crossover serial cable connected to ttyS0 on both .) 2 hdds (hda+hdc) with the following ptbl: Disk /dev/hda: 120.0 GB, 120060444672 bytes 255 heads, 63 sectors/track, 14596 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 1 973 7815591 fd Linux raid autodetect /dev/hda2 974 1459 3903795 fd Linux raid autodetect /dev/hda3 1460 1708 2000092+ 82 Linux swap /dev/hda4 1709 14500 102751740 fd Linux raid autodetect the partitions are mounted as follows: /dev/md1 on / type ext3 (rw) /dev/md2 on /var type ext3 (rw) /dev/md4 on /vservers type ext3 (rw) (yes i called it md4.) both systems are running debian sarge. i now wanted to make md4 "a drbd device" and used the following drbd.conf: ---------- global { minor_count=1 # disable_io_hints } resource vs { protocol = C fsckcmd = /bin/true # inittimeout=60 # skip-wait # load-only # incon-degr-cmd=halt -f disk { do-panic disk-size = 51375808k } net { sndbuf-size = 512k skip-sync # sync-group = 1 sync-min = 10M # syncer tries hard to not drop below this rate sync-max = 25M # if you don't care about network saturation tl-size = 5000 # transfer log size, ensures strict write ordering timeout = 60 # unit: 0.1 seconds connect-int = 10 # unit: seconds ping-int = 10 # unit: seconds ko-count = 10 # if some block send times out this many times, # the peer is considered dead, even if it still # answeres ping requests } on cthon { device = /dev/nb0 disk = /dev/md4 address = 10.99.99.1 port = 7788 } on nightmare { device = /dev/nb0 disk = /dev/md4 address = 10.99.99.2 port = 7788 } } ---------- i was first using "drbd-0.6.12.tar.gz" and then tried checking out the current stable cvs, but no difference. my problem is that, when i run /etc/init.d/drbd start on both boxes, everything first seems to work as it should: primary (cthon): version: 0.6.12 (api:64/proto:62) 0: cs:Connected st:Primary/Secondary ns:0 nr:0 dw:0 dr:0 pe:0 ua:0 NEEDS_SYNC secondary (nightmare): version: 0.6.12 (api:64/proto:62) 0: cs:WFConnection st:Secondary/Unknown ns:0 nr:0 dw:0 dr:0 pe:0 ua:0 NEEDS_SYNC INCONSISTENT then i start the resync process with: cthon:~# drbdsetup /dev/nb0 replicate it does work, stating 0: cs:SyncingAll st:Primary/Secondary ns:116020 nr:0 dw:0 dr:116224 pe:20469 ua:0 [>...................] sync'ed: 0.3% (50057/50171)M finish: 4:02:59h speed: 3,529 (3,529) K/sec BUT in the meantime the syncer crashed on the primary.. according to syslog: Jun 25 14:10:33 cthon drbd: ===> drbd start <=== Jun 25 14:10:33 cthon drbd: modprobe -s drbd minor_count=1 Jun 25 14:10:33 cthon kernel: drbd: initialised. Version: 0.6.12 (api:64/proto:62) Jun 25 14:10:33 cthon drbd: drbdsetup /dev/nb0 disk /dev/md4 --do-panic --disk-size=51375808k Jun 25 14:10:33 cthon drbd: drbdsetup /dev/nb0 net 10.99.99.1:7788 10.99.99.2:7788 C --sndbuf-size=512k --skip-sync --sync-min=10M - -sync-max=25M --tl-size=5000 --timeout=60 --connect-int=10 --ping-int=10 --ko-count=10 Jun 25 14:10:33 cthon drbd: drbdsetup /dev/nb0 wait_connect -t 0 Jun 25 14:10:33 cthon kernel: drbd0: Connection established. size=51375808 KB / blksize=4096 B Jun 25 14:10:33 cthon kernel: klogd 1.4.1, ---------- state change ---------- Jun 25 14:10:33 cthon kernel: Loaded 60 symbols from 1 module. Jun 25 14:11:07 cthon kernel: drbd0: FULL Synchronisation started blks=64 Jun 25 14:11:10 cthon kernel: general protection fault: 0000 Jun 25 14:11:10 cthon kernel: CPU: 0 Jun 25 14:11:10 cthon kernel: EIP: 0010:[<de92d773>] Not tainted Jun 25 14:11:10 cthon kernel: EFLAGS: 00010286 Jun 25 14:11:10 cthon kernel: eax: ffffffff ebx: 00000000 ecx: 00000001 edx: ffffffff Jun 25 14:11:10 cthon kernel: esi: ffffffff edi: dbd33f9c ebp: 0000f200 esp: dbd33f44 Jun 25 14:11:10 cthon kernel: ds: 0018 es: 0018 ss: 0018 Jun 25 14:11:10 cthon kernel: Process drbdd_0 (pid: 638, stackpage=dbd33000) Jun 25 14:11:10 cthon kernel: Stack: dbd33f14 00000001 00000000 00000000 00004100 00000286 00000000 dcc88000 Jun 25 14:11:10 cthon kernel: dbd33f9c 00000000 de92aff9 ffffffff 0000f200 00000001 00000000 00000000 Jun 25 14:11:10 cthon kernel: 00000000 0000214a 00000000 dcc88000 dbf5c5a0 00000002 67027483 00000300 Jun 25 14:11:10 cthon kernel: Call Trace: [<de92aff9>] [<de92b45a>] [<de92f4db>] [<de926a8d>] [<c010738e>] Jun 25 14:11:10 cthon kernel: [<de926a60>] Jun 25 14:11:10 cthon kernel: Jun 25 14:11:10 cthon kernel: Code: 8b 06 0f b6 58 14 a1 e4 fb 92 de 69 db dc 02 00 00 01 c3 31 Jun 25 14:11:21 cthon kernel: <3>drbd0: [drbd_syncer_0/685] sock_sendmsg timeout count down: ko=9 Jun 25 14:11:24 cthon kernel: drbd0: ping ack did not arrive, trying to reconnect Jun 25 14:11:27 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg timeout count down: ko=8 Jun 25 14:11:33 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg timeout count down: ko=7 Jun 25 14:11:39 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg timeout count down: ko=6 Jun 25 14:11:45 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg timeout count down: ko=5 Jun 25 14:11:51 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg timeout count down: ko=4 Jun 25 14:11:57 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg timeout count down: ko=3 Jun 25 14:12:03 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg timeout count down: ko=2 Jun 25 14:12:09 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg timeout count down: ko=1 Jun 25 14:12:14 cthon kernel: drbd0: [drbd_syncer_0/685] sock_sendmsg returned -104 Jun 25 14:12:14 cthon kernel: drbd0: Syncer send failed. this definitely doesn't look right to me =) i noticed people were talking about the blksize, but how can i change that one? am i overlooking something else? any help would be appreciated.. tia! -- Markus Rambossek MXR66-RIPE mxr at mos.at <mailto:mxr at mos.at> carrier66.net NetWork DataCenter Wien Shuttleworthstrasse 4-8 1210 Wien Austria Mobil: +43 650 4126691