Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
The bad news is there seems to be a similar issue on the receiver side. I had node 1 crashing with this problem and after taking the latest CVS fix, node 1 is now Ok, but node 2 connects and locks the kernel hard .. with some problem (that rolls off the screen) .. in_atomic ... __might_sleep The node 1 half of the log is: May 6 16:42:04 vossdir1 kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex May 6 16:42:06 vossdir1 kernel: drbd1: Connection established. May 6 16:42:06 vossdir1 kernel: drbd1: Resync started as source (need to sync 27010624 KB). May 6 16:42:06 vossdir1 kernel: drbd0: Connection established. May 6 16:42:07 vossdir1 kernel: drbd0: Resync started as source (need to sync 26744896 KB). May 6 16:42:08 vossdir1 Keepalived_vrrp: Kernel is reporting: Group(VG1) UP May 6 16:42:09 vossdir1 logger: DRBD waiting for sync to complete May 6 16:42:20 vossdir1 kernel: drbd1: [drbd1_worker/8386] sock_sendmsg time expired, ko = 9 May 6 16:42:21 vossdir1 kernel: drbd0: [drbd0_worker/8387] sock_sendmsg time expired, ko = 9 May 6 16:42:23 vossdir1 kernel: drbd1: [drbd1_worker/8386] sock_sendmsg time expired, ko = 8 May 6 16:42:23 vossdir1 kernel: drbd1: PingAck did not arrive in time. May 6 16:42:23 vossdir1 kernel: drbd1: asender terminated May 6 16:42:23 vossdir1 kernel: drbd1: short read expecting header on sock: r=-512 May 6 16:42:23 vossdir1 kernel: drbd1: _drbd_send_page: size=4096 len=4008 sent=-4 May 6 16:42:23 vossdir1 kernel: drbd1: drbd_send_block() failed May 6 16:42:23 vossdir1 kernel: drbd1: worker terminated May 6 16:42:23 vossdir1 kernel: drbd1: Connection lost. May 6 16:42:24 vossdir1 kernel: drbd0: [drbd0_worker/8387] sock_sendmsg time expired, ko = 8 May 6 16:42:24 vossdir1 kernel: drbd0: PingAck did not arrive in time. May 6 16:42:24 vossdir1 kernel: drbd0: asender terminated May 6 16:42:24 vossdir1 kernel: drbd0: short read expecting header on sock: r=-512 May 6 16:42:24 vossdir1 kernel: drbd0: _drbd_send_page: size=4096 len=2312 sent=-4 May 6 16:42:24 vossdir1 kernel: drbd0: drbd_send_block() failed May 6 16:42:24 vossdir1 kernel: drbd0: ASSERT( list_empty(&mdev->data.work.q) ) in /usr/local/src/drbd-cvs-2004-05- May 6 16:42:24 vossdir1 kernel: drbd0: worker terminated May 6 16:42:24 vossdir1 kernel: drbd0: Connection lost. I'll have to write the node 2 stuff down by hand if this is not enough ... Philipp Reisner wrote: >On Thursday 06 May 2004 15:29, Eugene Crosser wrote: > > >>Debug: sleeping function called from invalid context at mm/slab.c:1967 >> >> > > >RCS file: /var/lib/cvs/drbd/drbd/drbd/drbd_receiver.c,v >retrieving revision 1.97.2.147 >diff -u -p -u -r1.97.2.147 drbd_receiver.c >--- drbd_receiver.c 6 May 2004 13:59:14 -0000 1.97.2.147 >+++ drbd_receiver.c 6 May 2004 15:55:43 -0000 >@@ -268,11 +268,14 @@ struct Tl_epoch_entry* drbd_get_ee(drbd_ > prepare_to_wait(&mdev->ee_wait, &wait, > TASK_INTERRUPTIBLE); > if(!list_empty(&mdev->free_ee)) break; >+ spin_unlock_irq(&mdev->ee_lock); > if( ( mdev->ee_vacant+mdev->ee_in_use) < > mdev->conf.max_buffers ) { >- if(drbd_alloc_ee(mdev,GFP_TRY)) break; >+ if(drbd_alloc_ee(mdev,GFP_TRY)) { >+ spin_lock_irq(&mdev->ee_lock); >+ break; >+ } > } >- spin_unlock_irq(&mdev->ee_lock); > drbd_kick_lo(mdev); > schedule(); > spin_lock_irq(&mdev->ee_lock); > >-philipp > >