Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
The bad news is there seems to be a similar issue on the receiver side.
I had node 1 crashing with this problem and after taking the latest CVS
fix, node 1 is now Ok, but node 2 connects and locks the kernel hard ..
with some problem (that rolls off the screen) ..
in_atomic ...
__might_sleep
The node 1 half of the log is:
May 6 16:42:04 vossdir1 kernel: e1000: eth1 NIC Link is Up 1000 Mbps
Full Duplex
May 6 16:42:06 vossdir1 kernel: drbd1: Connection established.
May 6 16:42:06 vossdir1 kernel: drbd1: Resync started as source (need
to sync 27010624 KB).
May 6 16:42:06 vossdir1 kernel: drbd0: Connection established.
May 6 16:42:07 vossdir1 kernel: drbd0: Resync started as source (need
to sync 26744896 KB).
May 6 16:42:08 vossdir1 Keepalived_vrrp: Kernel is reporting: Group(VG1) UP
May 6 16:42:09 vossdir1 logger: DRBD waiting for sync to complete
May 6 16:42:20 vossdir1 kernel: drbd1: [drbd1_worker/8386] sock_sendmsg
time expired, ko = 9
May 6 16:42:21 vossdir1 kernel: drbd0: [drbd0_worker/8387] sock_sendmsg
time expired, ko = 9
May 6 16:42:23 vossdir1 kernel: drbd1: [drbd1_worker/8386] sock_sendmsg
time expired, ko = 8
May 6 16:42:23 vossdir1 kernel: drbd1: PingAck did not arrive in time.
May 6 16:42:23 vossdir1 kernel: drbd1: asender terminated
May 6 16:42:23 vossdir1 kernel: drbd1: short read expecting header on
sock: r=-512
May 6 16:42:23 vossdir1 kernel: drbd1: _drbd_send_page: size=4096
len=4008 sent=-4
May 6 16:42:23 vossdir1 kernel: drbd1: drbd_send_block() failed
May 6 16:42:23 vossdir1 kernel: drbd1: worker terminated
May 6 16:42:23 vossdir1 kernel: drbd1: Connection lost.
May 6 16:42:24 vossdir1 kernel: drbd0: [drbd0_worker/8387] sock_sendmsg
time expired, ko = 8
May 6 16:42:24 vossdir1 kernel: drbd0: PingAck did not arrive in time.
May 6 16:42:24 vossdir1 kernel: drbd0: asender terminated
May 6 16:42:24 vossdir1 kernel: drbd0: short read expecting header on
sock: r=-512
May 6 16:42:24 vossdir1 kernel: drbd0: _drbd_send_page: size=4096
len=2312 sent=-4
May 6 16:42:24 vossdir1 kernel: drbd0: drbd_send_block() failed
May 6 16:42:24 vossdir1 kernel: drbd0: ASSERT(
list_empty(&mdev->data.work.q) ) in /usr/local/src/drbd-cvs-2004-05-
May 6 16:42:24 vossdir1 kernel: drbd0: worker terminated
May 6 16:42:24 vossdir1 kernel: drbd0: Connection lost.
I'll have to write the node 2 stuff down by hand if this is not enough ...
Philipp Reisner wrote:
>On Thursday 06 May 2004 15:29, Eugene Crosser wrote:
>
>
>>Debug: sleeping function called from invalid context at mm/slab.c:1967
>>
>>
>
>
>RCS file: /var/lib/cvs/drbd/drbd/drbd/drbd_receiver.c,v
>retrieving revision 1.97.2.147
>diff -u -p -u -r1.97.2.147 drbd_receiver.c
>--- drbd_receiver.c 6 May 2004 13:59:14 -0000 1.97.2.147
>+++ drbd_receiver.c 6 May 2004 15:55:43 -0000
>@@ -268,11 +268,14 @@ struct Tl_epoch_entry* drbd_get_ee(drbd_
> prepare_to_wait(&mdev->ee_wait, &wait,
> TASK_INTERRUPTIBLE);
> if(!list_empty(&mdev->free_ee)) break;
>+ spin_unlock_irq(&mdev->ee_lock);
> if( ( mdev->ee_vacant+mdev->ee_in_use) <
> mdev->conf.max_buffers ) {
>- if(drbd_alloc_ee(mdev,GFP_TRY)) break;
>+ if(drbd_alloc_ee(mdev,GFP_TRY)) {
>+ spin_lock_irq(&mdev->ee_lock);
>+ break;
>+ }
> }
>- spin_unlock_irq(&mdev->ee_lock);
> drbd_kick_lo(mdev);
> schedule();
> spin_lock_irq(&mdev->ee_lock);
>
>-philipp
>
>