Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2004-06-05 01:34:54 -0400
\ T. Howell-Cintron:
>
> I'm running 0.7pre7 2004/06/01. The (initial) primary is an FC2 2.6.5
> and runs fine, save for eventually noting that the connection to the
> secondary was lost. The secondary is a FC1 2.4.22 machine and loads the
> module fine, but shortly after starting to sync I get several oops (see
> attached).
we did not test drbd 0.7 on 2.4 kernel in a while,
only made sure it still compiles.
it is likely that there is some strange
typo/misconcepting in the 2.4 compat wrappers.
in case I have time, or it occurs to me that its obvious,
I'll dig into this, but don'T expect a patch too soon.
meanwhile, please use 2.6 kernels only.
> #
> # initial module load, sync, etc.
> #
>
> drbd: initialised. Version: 0.7-pre7 cvs $Date: 2004/06/01 07:33:19 $ (api:73/proto:72)
> drbd0: size = 59713605 KB
> drbd0: 56879176 KB marked out-of-sync by on disk bit-map.
> drbd0: No usable activity log found.
> drbd0: Connection established.
> drbd0: Peer switched to Primary state
> drbd0: Resync started as target (need to sync 56879173 KB).
> Unable to handle kernel NULL pointer dereference at virtual address 00000004
> printing eip:
> d08f85f7
> *pde = 0b9a1067
> *pte = 00000000
> Oops: 0002
> drbd parport_pc lp parport 3c59x keybdev mousedev hid input usb-uhci usbcore ext3 jbd aic7xxx sd_mod scsi_mod
> CPU: 0
> EIP: 0060:[<d08f85f7>] Not tainted
> EFLAGS: 00010086
>
> EIP is at finish_wait [drbd] 0x27 (2.4.22-1.2188.nptl)
aha.
it just occured that its obvious.
first, your kernel already contains a backport of those functions,
second, our finish_wait backport was wrong.
ok, this should be fixed by this, which should be in cvs soonish...
diff -u -p -r1.97.2.165 drbd_receiver.c
=======================
--- drbd_receiver.c 1 Jun 2004 14:29:07 -0000 1.97.2.165
+++ drbd_receiver.c 5 Jun 2004 07:54:52 -0000
@@ -248,11 +248,11 @@ STATIC void prepare_to_wait(wait_queue_h
{
unsigned long flags;
+ __set_current_state(state);
wait->flags &= ~WQ_FLAG_EXCLUSIVE;
spin_lock_irqsave(&q->lock, flags);
if (list_empty(&wait->task_list))
__add_wait_queue(q, wait);
- set_current_state(state);
spin_unlock_irqrestore(&q->lock, flags);
}
@@ -261,10 +261,11 @@ STATIC void finish_wait(wait_queue_head_
unsigned long flags;
__set_current_state(TASK_RUNNING);
-
- spin_lock_irqsave(&q->lock, flags);
- list_del_init(&wait->task_list);
- spin_unlock_irqrestore(&q->lock, flags);
+ if (!list_empty(&wait->task_list)) {
+ spin_lock_irqsave(&q->lock, flags);
+ list_del_init(&wait->task_list);
+ spin_unlock_irqrestore(&q->lock, flags);
+ }
}
#define DEFINE_WAIT(name) DECLARE_WAITQUEUE(name,current)
=======================
> #
> # after running 'drbdadm disconnect..'
> #
> <1>Unable to handle kernel NULL pointer dereference at virtual address 00000004
> printing eip:
> c012731f
> *pde = 00000000
> Oops: 0000
> drbd parport_pc lp parport 3c59x keybdev mousedev hid input usb-uhci usbcore ext3 jbd aic7xxx sd_mod scsi_mod
> CPU: 0
> EIP: 0060:[<c012731f>] Not tainted
> EFLAGS: 00010006
>
> EIP is at force_sig_info [kernel] 0x2f (2.4.22-1.2188.nptl)
> eax: 00000014 ebx: cfa47b9c ecx: cb4c4000 edx: 00000000
> esi: cfa47800 edi: 00000001 ebp: cfa47940 esp: cb0abed0
> ds: 0068 es: 0068 ss: 0068
> Process drbdsetup (pid: 2711, stackpage=cb0ab000)
> Stack: 00002b00 00000001 00000001 00000282 cfa47b9c cfa47800 00000001 cfa47940
> c0127bef 00000001 00000001 cb4c4000 d08f1d0d 00000001 cb4c4000 d08f19db
> 00000000 c01452f4 00000296 00000000 cfa47800 d08f0a27 cfa47b9c 00000000
> Call Trace: [<c0127bef>] force_sig [kernel] 0x1f (0xcb0abef0)
hm. this is not that easy.
we don't register signal handlers, but utilize force_sig nevertheless.
RH 2.4 force_sig_info seems to not like this, and chokes somewhere in
kernel/signal.c ...
sorry, don't see the exact cause now, and no quick fix either.
Lars Ellenberg