Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-07-23 09:04:15 +0200 \ Martin Bene: > Hi, > > I just ran into a couple of strange effects when setting up drbd 0.7 on > our test cluster; > > 4 drbd devices, 2 - 18 GB each. > > Setup was slightly non - standard: I configured just one side to > primary/consistent before adding the 2nd side (needed to copy data onto > the new drbd devices). > > drbd setup of node1: all went as expected, no trouble. created > filesystems, mounted, copied data onto drbd. > > drbd setup on node2: > # /etc/init.d/drbd start > * Starting DRBD... > Child process does not terminate! > Exiting. > > drbdadm up all in the startup script finished configuration for just he > first drbd device. setup for the 2nd one seems to have hung for some > time > > 31008 ? SW 0:00 [drbd0_worker] > 29849 ? RW 0:11 [drbd0_receiver] > 2845 pts/0 D 0:00 /sbin/drbdsetup /dev/nbd/1 disk /dev/md5 > internal -1 --on-io-error=panic > 5207 ? RW 0:00 [drbd0_asender] > > finaly resulting in > > test-neu1 user # cat /proc/drbd > version: 0.7.0 svn $Rev: 1438 $ (api:74/proto:74) > > 0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent > ns:0 nr:27311780 dw:27311780 dr:0 al:0 bm:3447 lo:134 pe:493 ua:134 > ap:0 > [==================>.] sync'ed: 93.7% (1818/28487)M > finish: 0:01:07 speed: 27,482 (25,008) K/sec > 1: cs:StandAlone st:Secondary/Unknown ld:Inconsistent > ns:0 nr:0 dw:0 dr:0 al:0 bm:1781 lo:0 pe:0 ua:0 ap:0 > 2: cs:Unconfigured > 3: cs:Unconfigured > > on the 2nd node. > > Timeout on initialising the internal metadata while resync is already > running (and this slowing things down) on the 1st device? yes, this may be the cause. you happen to use the same underlying physical devices? is this a 2.4. kernel? > Here's the relevant syslog entries: > > 08:30:40 drbd: initialised. Version: 0.7.0 svn $Rev: 1438 $ (api:74/proto:74) > 08:30:40 drbd: registered as block device major 43 > 08:30:40 drbd0: Creating state block > 08:30:40 drbd0: resync bitmap: bits=7292848 words=227902 > 08:30:40 drbd0: size = 29171392 KB > 08:30:40 drbd0: Assuming that all blocks are out of sync (aka FullSync) > 08:30:45 drbd0: 29171392 KB now marked out-of-sync by on disk bit-map. > 08:30:46 drbd1: Creating state block > 08:30:46 drbd1: resync bitmap: bits=7292848 words=227902 > 08:30:46 drbd1: size = 29171392 KB > 08:30:46 drbd1: Assuming that all blocks are out of sync (aka FullSync) > 08:30:46 drbd0: Handshake successful: DRBD Protocol version 74 > 08:30:46 drbd0: Connection established. > 08:30:46 drbd0: Secondary/Unknown --> Secondary/Primary > 08:30:46 drbd0: Resync started as SyncTarget > (need to sync 29171392 KB [7292848 bits set]). > 08:31:41 drbd1: 29171392 KB now marked out-of-sync by on disk bit-map. > > Another effect: The progress bar for resync diplayed on node1 seems to > be inconsistent > > test-neu2 drbd # cat /proc/drbd > version: 0.7.0 svn $Rev: 1438 $ (api:74/proto:74) > > 0: cs:SyncSource st:Primary/Secondary ld:Consistent > ns:8745864 nr:0 dw:87300 dr:9494623 al:109 bm:1412 lo:700 pe:1350 > ua:700 ap:0 > [=================>..] sync'ed: 88.9% (19952/28487)M > finish: 0:13:34 speed: 25,076 (25,187) K/sec > > test-neu2 drbd # cat /proc/drbd > version: 0.7.0 svn $Rev: 1438 $ (api:74/proto:74) > > 0: cs:SyncSource st:Primary/Secondary ld:Consistent > ns:12940824 nr:0 dw:87492 dr:13690591 al:109 bm:1668 lo:1000 pe:1876 > ua:1000 ap:0 > [========>...........] sync'ed: 44.4% (15858/28487)M > finish: 0:09:20 speed: 28,933 (25,160) K/sec > > Time to finish and synced/size info seem to be OK; but the progress bar > definitely isn't.. started out at ~50%, went to 100% and then jumped > back ~40. we patched the code there with some 64bit long compatibility things. seems like there sneaked in some integer overflow issue... Lars Ellenberg -- please use the "List-Reply" function of your email client.