[DRBD-user] mount hangs -0.7-cvs-2004-04-22 (today)

Thu Apr 22 14:21:51 CEST 2004

Lars Ellenberg wrote:

>/ 2004-04-22 11:04:53 +0000
>\ Ron O'Hara:
>  
>
>>Hi,
>>
>>I may have hit a misunderstanding of the correct operational sequence, 
>>or maybe some bug.  This is with 0.7 pulled from CVS today (22nd April 
>>2004).
>>
>>Scenario .... two systems
>>
>>Install drbd on both systems..   identical systems  ...
>>
>>modprobe drbd on sys1
>>modprobe drbd on sys2
>>
>>On sys1
>>       /sbin/drbdadm up all
>>       /sbin/drbdadm wait_connect all
>>Then on sys2
>>       /sbin/drbdadm up all
>>       /sbin/drbdadm wait_connect all
>>
>>All connects nicely - both ends are Secondary ... and sync up
>>
>>On sys1
>>       /sbin/drbdadm primary r0
>>
>>Looks good in /proc/drbd     (Primary/Secondary as expected)
>>
>>
>>mount /dev/nb0 /data       <<<<< this just hangs
>>
>>
>>
>>If I go onto sys2, and         /sbin/drbdadm down all
>>
>>then the mount completes  .... and I can bring connect sys2 again .....
>>
>>Any ideas on where to look/trace/probe to get more info on what syscall 
>>'mount' is stuck in?
>>    
>>
>
>maybe, just maybe, systems are too busy synching,
>and mount just is too slow, because it needs to access/replay the
>journal, DRBD needs to make sure sync and application requests do not
>conflict. hopefully this is no real bug, but only needs tuning.
>
>how long did you wait when mount was "just hanging" ?
>  
>
I waited about 2 minutes. There was only 4kb of data to be synced.

Apr 22 09:57:16 vossdir1 kernel: drbd: initialised. Version: 
0.7-cvs-2004-04-22 (api:72/proto:72)
    new module loaded here

Apr 22 09:57:22 vossdir1 kernel: drbd0: size = 53919620 KB
Apr 22 09:57:22 vossdir1 kernel: drbd0: 4 KB marked out-of-sync by on 
disk bit-map.
Apr 22 09:57:22 vossdir1 kernel: drbd0: Found 6 transactions (324 active 
extents) in activity log.
Apr 22 09:57:22 vossdir1 kernel: drbd0: Connection established.
Apr 22 09:57:22 vossdir1 kernel: drbd0: Resync started as source (need 
to sync 4 KB).
Apr 22 09:57:22 vossdir1 kernel: drbd0: Resync done (total 1 sec; 4 K/sec)
at this point the systems are in sync and I issued the mount

Apr 22 09:59:40 vossdir1 kernel: drbd0: sock was shut down by peer
This is when I drdbadm down r0 on sys2

Apr 22 09:59:40 vossdir1 kernel: drbd0: short read expecting header on 
sock: r=0
Apr 22 09:59:40 vossdir1 kernel: drbd0: meta connection shut down by peer.
Apr 22 09:59:40 vossdir1 kernel: drbd0: asender terminated
Apr 22 09:59:40 vossdir1 kernel: drbd0: worker terminated
Apr 22 09:59:40 vossdir1 kernel: EXT3 FS on drbd0, internal journal
The mount completes .... it looks like the journal replay was stalled 
until this was done... is drdb holding some flag/semaphore that ext3 wants ?

Apr 22 09:59:40 vossdir1 kernel: drbd0: Connection lost.
Apr 22 09:59:56 vossdir1 kernel: drbd0: Connection established.
Apr 22 09:59:56 vossdir1 kernel: drbd0: Resync started as source (need 
to sync 0 KB).
Apr 22 09:59:56 vossdir1 kernel: drbd0: Resync done (total 1 sec; 0 K/sec)
Here is where I bring sys2 up again..

This is easily repeatable .... umount also hangs in a similar way ... 
and as a side effect prevents system shutdown too.

Ron

>	Lars Ellenberg
>_______________________________________________
>drbd-user mailing list
>drbd-user at lists.linbit.com
>http://lists.linbit.com/mailman/listinfo/drbd-user
>  
>