[DRBD-user] oops on 2.6.5-rc3-bk2 + drbd-0.7-cvs

Philipp Reisner philipp.reisner at linbit.com
Thu Apr 8 12:25:22 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thursday 08 April 2004 11:53, Andreas Schultz wrote:
> On Wednesday 07 April 2004 17:29, Philipp Reisner wrote:
>
> [...]
>
> > Could you try this patch ?
> >
> > - There is a good chance that it will "just work".
>
> It works, kind off.
>
> Normal operation appears to be ok, initial resync, normal work and so one.
>
> A problem occurred when i attempted to repair the underlaying soft raid5
> array while a drbd resync was underway. The raid rebuild caused the drbd
> sync to stall. The drbd alsodid not recover after the raid rebuild was
> completed. Stopping the secondary (drbdadm down ...) worked ok, but
> stopping the primary failed:
>
> sdev01:~# drbdadm down db
> drbd0: worker terminated
>
> Child process does not terminate!
> Exiting.
>
> # tail /var/log/syslog
> Apr  8 10:57:26 sdev01 kernel: drbd0: Resync started as source (need to
> sync 40.
> Apr  8 11:09:20 sdev01 kernel: drbd0: meta connection shut down by peer.
> Apr  8 11:09:20 sdev01 kernel: drbd0: asender terminated
> Apr  8 11:10:21 sdev01 kernel: drbd0: worker terminated
>
> # ps xaw
>   PID TTY      STAT   TIME COMMAND
> 22377 ?        DW     0:14 [drbd0_receiver]
>   625 ttyS0    D      0:00 /sbin/drbdsetup /dev/drbd0 down
>
> backtrace for drbd0_receiver (sysreq-t):
> _drbd_process_ee
> drbd0_receive D 00000086     0 22377      1           625  9715 (L-TLB)
> c2441f08 00000046 00000003 00000086 00000001 00000000 f77e02a0 00000000
>        c2440000 c2441f08 f8c4d347 00000001 d283346c d2833000 c2441ef4
> c2441eec c01fcf96 c1a0cbe0 000186a0 93e1f740 000f7a28 cb063388 d2833000
> 00000000 Call Trace:
>  [<f8c4d347>] _drbd_process_ee+0x137/0x1d0 [drbd]
>  [<c01fcf96>] generic_unplug_device+0x66/0x70
>  [<f8c4d4ec>] drbd_get_ee+0x10c/0x280 [drbd]
>  [<c011e270>] default_wake_function+0x0/0x20
>  [<f8c4ef1a>] receive_DataRequest+0xca/0x290 [drbd]
>  [<f8c4de5d>] drbd_recv_header+0x1d/0xe0 [drbd]
>  [<f8c4ff52>] drbdd_init+0xb2/0x6c0 [drbd]
>  [<c0124d24>] daemonize+0xa4/0xb0
>  [<f8c472cd>] drbd_thread_setup+0x3d/0x60 [drbd]
>  [<f8c47290>] drbd_thread_setup+0x0/0x60 [drbd]
>  [<c0106005>] kernel_thread_helper+0x5/0x10
>
> sdev01:~# cat /proc/modules
> drbd 91616 1 - Live 0xf8c45000
>
> sdev01:/# addr2line -e /usr/src/kbuild-srv-dev/drivers/block/drbd/drbd.o
> 8347 /usr/src/linux-2.6.5-vs/drivers/block/drbd/drbd_receiver.c:350
>
> The above source code line looks a bit strange. I'll probably
> missunderstood something about how the real address/offset has to be
> calculated. Let me know if you need any additional information.
>

If I assume that it was blocked in _drbd_process_ee()..

_drbd_process_ee() tries to send ACKs. What happens if it can
not get them through ? Why is it in "D" state? blocked
on the send_mutex ?

The output of "netstat -t" would be nice in this situation.

-Philipp
-- 
: Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :



More information about the drbd-user mailing list