[DRBD-user] Invalidate

Lars Ellenberg Lars.Ellenberg at linbit.com
Thu Aug 4 17:11:53 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> I will attach it.

> Aug  4 12:24:16 test2 kernel: Oops: 0002 [#3]
> Aug  4 12:24:16 test2 kernel: SMP
> Aug  4 12:24:16 test2 kernel: Modules linked in: drbd iptable_filter ehci_hcd uhci_hcd
> Aug  4 12:24:16 test2 kernel: CPU:    2
> Aug  4 12:24:16 test2 kernel: EIP:    0060:[_spin_lock_irqsave+5/29]    Not tainted VLI
> Aug  4 12:24:16 test2 kernel: EIP:    0060:[<c035d683>]    Not tainted VLI
> Aug  4 12:24:16 test2 kernel: EFLAGS: 00010006   (2.6.12.2)
> Aug  4 12:24:16 test2 kernel: EIP is at _spin_lock_irqsave+0x5/0x1d

someone tried to treat a NULL pointer as spinlock.

> Aug  4 12:24:16 test2 kernel: eax: 00000206   ebx: 00000001   ecx: 00000001   edx: 65726d78
> Aug  4 12:24:16 test2 drbd: Command '/sbin/drbdsetup /dev/drbd0 down' terminated with exit code -1

someone seems to be drbdsetup.

> Aug  4 12:24:16 test2 kernel: esi: f6894a20   edi: f3fa3000   ebp: 00000001   esp: eea5de68
> Aug  4 12:24:16 test2 drbd: drbdsetup exited with code -1
> Aug  4 12:24:16 test2 kernel: ds: 007b   es: 007b   ss: 0068
> Aug  4 12:24:16 test2 drbd: .
> Aug  4 12:24:16 test2 kernel: Process drbdsetup (pid: 12388, threadinfo=eea5c000 task=eee3f520)
> Aug  4 12:24:16 test2 drbd: ERROR: Module drbd is in use
> Aug  4 12:24:16 test2 kernel: Stack: c0124548 00000046 f3fa33d0 42f1d434 00000002 f3fa3400 f3fa3000 c0124e5e
> Aug  4 12:24:16 test2 kernel:        00000001 00000001 f6894a20 f8c71b57 00000001 f6894a20 f3fa33d0 f8c7007b
> Aug  4 12:24:16 test2 kernel:        f8c7007b 00000282 f3fa33d0 42f1d434 f3fa3000 f8c63671 f3fa3400 00000000
> Aug  4 12:24:16 test2 kernel: Call Trace:
> Aug  4 12:24:16 test2 kernel:  [force_sig_info+39/165] force_sig_info+0x27/0xa5

seems to be in force_sig_info

> Aug  4 12:24:16 test2 kernel:  [<c0124548>] force_sig_info+0x27/0xa5
> Aug  4 12:24:16 test2 kernel:  [force_sig+31/35] force_sig+0x1f/0x23
> Aug  4 12:24:16 test2 kernel:  [<c0124e5e>] force_sig+0x1f/0x23
> Aug  4 12:24:16 test2 kernel:  [pg0+947690327/1068852224] _drbd_thread_stop+0x7d/0x1e9 [drbd]
> Aug  4 12:24:16 test2 kernel:  [<f8c71b57>] _drbd_thread_stop+0x7d/0x1e9 [drbd]

called from _drbd_thread_stop via force_sig

> Aug  4 12:24:16 test2 kernel:  [pg0+947683451/1068852224] drbd_rs_begin_io+0x8a/0x5ea [drbd]
> Aug  4 12:24:16 test2 kernel:  [<f8c7007b>] drbd_rs_begin_io+0x8a/0x5ea [drbd]

this does not make any sense.
the ioctl code won't have any reason to be in drbd_rs_begin_io.
stack dump seems to be unreliable here.

> Aug  4 12:24:16 test2 kernel:  [pg0+947631729/1068852224] drbd_ioctl+0xb7f/0xd2d [drbd]
> Aug  4 12:24:16 test2 kernel:  [<f8c63671>] drbd_ioctl+0xb7f/0xd2d [drbd]

well. this is how I read it:

the code path in question gets a spinlock, while holding this spinlock
acesses some task pointer, tests whether this is NULL, if not NULL,
forces a signal to the task pointed to by this.  the task pointer in
question is only ever changed while holding the very same spinlock.

the force_sig then goes to force_sig_info, which gets the
spinlock_irq_safe(&t->sighand->siglock), and that (or some component of
it) dereferences to NULL.

"this cannot happen"

(unless something else before that went terribly wrong, and some of our
threads died without that being noticed. or something corrupts memory.)

but I may be wrong, of course...

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list