Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote: >/ 2004-04-22 11:04:53 +0000 >\ Ron O'Hara: > > >>Hi, >> >>I may have hit a misunderstanding of the correct operational sequence, >>or maybe some bug. This is with 0.7 pulled from CVS today (22nd April >>2004). >> >>Scenario .... two systems >> >>Install drbd on both systems.. identical systems ... >> >>modprobe drbd on sys1 >>modprobe drbd on sys2 >> >>On sys1 >> /sbin/drbdadm up all >> /sbin/drbdadm wait_connect all >>Then on sys2 >> /sbin/drbdadm up all >> /sbin/drbdadm wait_connect all >> >>All connects nicely - both ends are Secondary ... and sync up >> >>On sys1 >> /sbin/drbdadm primary r0 >> >>Looks good in /proc/drbd (Primary/Secondary as expected) >> >> >>mount /dev/nb0 /data <<<<< this just hangs >> >> >> >>If I go onto sys2, and /sbin/drbdadm down all >> >>then the mount completes .... and I can bring connect sys2 again ..... >> >>Any ideas on where to look/trace/probe to get more info on what syscall >>'mount' is stuck in? >> >> > >maybe, just maybe, systems are too busy synching, >and mount just is too slow, because it needs to access/replay the >journal, DRBD needs to make sure sync and application requests do not >conflict. hopefully this is no real bug, but only needs tuning. > >how long did you wait when mount was "just hanging" ? > > I waited about 2 minutes. There was only 4kb of data to be synced. Apr 22 09:57:16 vossdir1 kernel: drbd: initialised. Version: 0.7-cvs-2004-04-22 (api:72/proto:72) new module loaded here Apr 22 09:57:22 vossdir1 kernel: drbd0: size = 53919620 KB Apr 22 09:57:22 vossdir1 kernel: drbd0: 4 KB marked out-of-sync by on disk bit-map. Apr 22 09:57:22 vossdir1 kernel: drbd0: Found 6 transactions (324 active extents) in activity log. Apr 22 09:57:22 vossdir1 kernel: drbd0: Connection established. Apr 22 09:57:22 vossdir1 kernel: drbd0: Resync started as source (need to sync 4 KB). Apr 22 09:57:22 vossdir1 kernel: drbd0: Resync done (total 1 sec; 4 K/sec) at this point the systems are in sync and I issued the mount Apr 22 09:59:40 vossdir1 kernel: drbd0: sock was shut down by peer This is when I drdbadm down r0 on sys2 Apr 22 09:59:40 vossdir1 kernel: drbd0: short read expecting header on sock: r=0 Apr 22 09:59:40 vossdir1 kernel: drbd0: meta connection shut down by peer. Apr 22 09:59:40 vossdir1 kernel: drbd0: asender terminated Apr 22 09:59:40 vossdir1 kernel: drbd0: worker terminated Apr 22 09:59:40 vossdir1 kernel: EXT3 FS on drbd0, internal journal The mount completes .... it looks like the journal replay was stalled until this was done... is drdb holding some flag/semaphore that ext3 wants ? Apr 22 09:59:40 vossdir1 kernel: drbd0: Connection lost. Apr 22 09:59:56 vossdir1 kernel: drbd0: Connection established. Apr 22 09:59:56 vossdir1 kernel: drbd0: Resync started as source (need to sync 0 KB). Apr 22 09:59:56 vossdir1 kernel: drbd0: Resync done (total 1 sec; 0 K/sec) Here is where I bring sys2 up again.. This is easily repeatable .... umount also hangs in a similar way ... and as a side effect prevents system shutdown too. Ron > Lars Ellenberg >_______________________________________________ >drbd-user mailing list >drbd-user at lists.linbit.com >http://lists.linbit.com/mailman/listinfo/drbd-user > >