Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote:
>/ 2004-04-22 11:04:53 +0000
>\ Ron O'Hara:
>
>
>>Hi,
>>
>>I may have hit a misunderstanding of the correct operational sequence,
>>or maybe some bug. This is with 0.7 pulled from CVS today (22nd April
>>2004).
>>
>>Scenario .... two systems
>>
>>Install drbd on both systems.. identical systems ...
>>
>>modprobe drbd on sys1
>>modprobe drbd on sys2
>>
>>On sys1
>> /sbin/drbdadm up all
>> /sbin/drbdadm wait_connect all
>>Then on sys2
>> /sbin/drbdadm up all
>> /sbin/drbdadm wait_connect all
>>
>>All connects nicely - both ends are Secondary ... and sync up
>>
>>On sys1
>> /sbin/drbdadm primary r0
>>
>>Looks good in /proc/drbd (Primary/Secondary as expected)
>>
>>
>>mount /dev/nb0 /data <<<<< this just hangs
>>
>>
>>
>>If I go onto sys2, and /sbin/drbdadm down all
>>
>>then the mount completes .... and I can bring connect sys2 again .....
>>
>>Any ideas on where to look/trace/probe to get more info on what syscall
>>'mount' is stuck in?
>>
>>
>
>maybe, just maybe, systems are too busy synching,
>and mount just is too slow, because it needs to access/replay the
>journal, DRBD needs to make sure sync and application requests do not
>conflict. hopefully this is no real bug, but only needs tuning.
>
>how long did you wait when mount was "just hanging" ?
>
>
I waited about 2 minutes. There was only 4kb of data to be synced.
Apr 22 09:57:16 vossdir1 kernel: drbd: initialised. Version:
0.7-cvs-2004-04-22 (api:72/proto:72)
new module loaded here
Apr 22 09:57:22 vossdir1 kernel: drbd0: size = 53919620 KB
Apr 22 09:57:22 vossdir1 kernel: drbd0: 4 KB marked out-of-sync by on
disk bit-map.
Apr 22 09:57:22 vossdir1 kernel: drbd0: Found 6 transactions (324 active
extents) in activity log.
Apr 22 09:57:22 vossdir1 kernel: drbd0: Connection established.
Apr 22 09:57:22 vossdir1 kernel: drbd0: Resync started as source (need
to sync 4 KB).
Apr 22 09:57:22 vossdir1 kernel: drbd0: Resync done (total 1 sec; 4 K/sec)
at this point the systems are in sync and I issued the mount
Apr 22 09:59:40 vossdir1 kernel: drbd0: sock was shut down by peer
This is when I drdbadm down r0 on sys2
Apr 22 09:59:40 vossdir1 kernel: drbd0: short read expecting header on
sock: r=0
Apr 22 09:59:40 vossdir1 kernel: drbd0: meta connection shut down by peer.
Apr 22 09:59:40 vossdir1 kernel: drbd0: asender terminated
Apr 22 09:59:40 vossdir1 kernel: drbd0: worker terminated
Apr 22 09:59:40 vossdir1 kernel: EXT3 FS on drbd0, internal journal
The mount completes .... it looks like the journal replay was stalled
until this was done... is drdb holding some flag/semaphore that ext3 wants ?
Apr 22 09:59:40 vossdir1 kernel: drbd0: Connection lost.
Apr 22 09:59:56 vossdir1 kernel: drbd0: Connection established.
Apr 22 09:59:56 vossdir1 kernel: drbd0: Resync started as source (need
to sync 0 KB).
Apr 22 09:59:56 vossdir1 kernel: drbd0: Resync done (total 1 sec; 0 K/sec)
Here is where I bring sys2 up again..
This is easily repeatable .... umount also hangs in a similar way ...
and as a side effect prevents system shutdown too.
Ron
> Lars Ellenberg
>_______________________________________________
>drbd-user mailing list
>drbd-user at lists.linbit.com
>http://lists.linbit.com/mailman/listinfo/drbd-user
>
>