[DRBD-user] Latest test - 0.7 cvs ... feedack ....bang !! :)

Ron O'Hara rono at sentuny.com.au
Thu May 6 18:43:25 CEST 2004


The bad news is there seems to be a similar issue on the receiver side. 
I had node 1 crashing with this problem and after taking the latest CVS 
fix, node 1 is now Ok, but node 2 connects and locks the kernel hard .. 
with some problem (that rolls off the screen) ..

  in_atomic ...
     __might_sleep

The node 1 half of the log is:
May  6 16:42:04 vossdir1 kernel: e1000: eth1 NIC Link is Up 1000 Mbps 
Full Duplex
May  6 16:42:06 vossdir1 kernel: drbd1: Connection established.
May  6 16:42:06 vossdir1 kernel: drbd1: Resync started as source (need 
to sync 27010624 KB).
May  6 16:42:06 vossdir1 kernel: drbd0: Connection established.
May  6 16:42:07 vossdir1 kernel: drbd0: Resync started as source (need 
to sync 26744896 KB).
May  6 16:42:08 vossdir1 Keepalived_vrrp: Kernel is reporting: Group(VG1) UP
May  6 16:42:09 vossdir1 logger: DRBD waiting for sync to complete
May  6 16:42:20 vossdir1 kernel: drbd1: [drbd1_worker/8386] sock_sendmsg 
time expired, ko = 9
May  6 16:42:21 vossdir1 kernel: drbd0: [drbd0_worker/8387] sock_sendmsg 
time expired, ko = 9
May  6 16:42:23 vossdir1 kernel: drbd1: [drbd1_worker/8386] sock_sendmsg 
time expired, ko = 8
May  6 16:42:23 vossdir1 kernel: drbd1: PingAck did not arrive in time.
May  6 16:42:23 vossdir1 kernel: drbd1: asender terminated
May  6 16:42:23 vossdir1 kernel: drbd1: short read expecting header on 
sock: r=-512
May  6 16:42:23 vossdir1 kernel: drbd1: _drbd_send_page: size=4096 
len=4008 sent=-4
May  6 16:42:23 vossdir1 kernel: drbd1: drbd_send_block() failed
May  6 16:42:23 vossdir1 kernel: drbd1: worker terminated
May  6 16:42:23 vossdir1 kernel: drbd1: Connection lost.
May  6 16:42:24 vossdir1 kernel: drbd0: [drbd0_worker/8387] sock_sendmsg 
time expired, ko = 8
May  6 16:42:24 vossdir1 kernel: drbd0: PingAck did not arrive in time.
May  6 16:42:24 vossdir1 kernel: drbd0: asender terminated
May  6 16:42:24 vossdir1 kernel: drbd0: short read expecting header on 
sock: r=-512
May  6 16:42:24 vossdir1 kernel: drbd0: _drbd_send_page: size=4096 
len=2312 sent=-4
May  6 16:42:24 vossdir1 kernel: drbd0: drbd_send_block() failed
May  6 16:42:24 vossdir1 kernel: drbd0: ASSERT( 
list_empty(&mdev->data.work.q) ) in /usr/local/src/drbd-cvs-2004-05-
May  6 16:42:24 vossdir1 kernel: drbd0: worker terminated
May  6 16:42:24 vossdir1 kernel: drbd0: Connection lost.


I'll have to write the node 2 stuff down by hand if this is not enough ...


Philipp Reisner wrote:

>On Thursday 06 May 2004 15:29, Eugene Crosser wrote:
>  
>
>>Debug: sleeping function called from invalid context at mm/slab.c:1967
>>    
>>
>
>
>RCS file: /var/lib/cvs/drbd/drbd/drbd/drbd_receiver.c,v
>retrieving revision 1.97.2.147
>diff -u -p -u -r1.97.2.147 drbd_receiver.c
>--- drbd_receiver.c     6 May 2004 13:59:14 -0000       1.97.2.147
>+++ drbd_receiver.c     6 May 2004 15:55:43 -0000
>@@ -268,11 +268,14 @@ struct Tl_epoch_entry* drbd_get_ee(drbd_
>                        prepare_to_wait(&mdev->ee_wait, &wait,
>                                        TASK_INTERRUPTIBLE);
>                        if(!list_empty(&mdev->free_ee)) break;
>+                       spin_unlock_irq(&mdev->ee_lock);
>                        if( ( mdev->ee_vacant+mdev->ee_in_use) <
>                              mdev->conf.max_buffers ) {
>-                               if(drbd_alloc_ee(mdev,GFP_TRY)) break;
>+                               if(drbd_alloc_ee(mdev,GFP_TRY)) {
>+                                       spin_lock_irq(&mdev->ee_lock);
>+                                       break;
>+                               }
>                        }
>-                       spin_unlock_irq(&mdev->ee_lock);
>                        drbd_kick_lo(mdev);
>                        schedule();
>                        spin_lock_irq(&mdev->ee_lock);
>
>-philipp
>  
>




More information about the drbd-user mailing list