[Drbd-dev] DRBD-8: another crash following disk write failures

Thu Jan 25 22:39:44 CET 2007

Just run into another crash when we get disk write failures - this is related to the other recent problems where we could try to use the activity log after the disk is detached - in this case it's an actual crash rather than a BUG() call:

Unable to handle kernel NULL pointer dereference at virtual address 000000ac
EIP is at w_io_error+0x18/0xa0 [drbd]
Call Trace:
 [<c0105431>] show_stack_log_lvl+0xa1/0xe0
 [<c0105621>] show_registers+0x181/0x200
 [<c0105840>] die+0x100/0x1a0
 [<c0115746>] do_page_fault+0x3c6/0x8b1
 [<c0105097>] error_code+0x2b/0x30
 [<ee3b4b0e>] drbd_worker+0x2de/0x4b5 [drbd]
 [<ee3c6eec>] drbd_thread_setup+0x8c/0x100 [drbd]
 [<c0102ec5>] kernel_thread_helper+0x5/0x10

And the code in question is this:

int w_io_error(drbd_dev* mdev, struct drbd_work* w,int cancel)
{
    drbd_request_t *req = (drbd_request_t*)w;
    int ok;

    /* FIXME send a "set_out_of_sync" packet to the peer
     * in the PassOn case...
     * in the Detach (or Panic) case, we (try to) send
     * a "we are diskless" param packet anyways, and the peer
     * will then set the FullSync bit in the meta data ...
     */
    D_ASSERT(mdev->bc->dc.on_io_error != PassOn);

Oops - mdev->bc can be NULL by the time we get here...

I propose simply commenting out the assert for now (patch attached - I left the code there because of the 'FIX ME' line above -- didn't want to lose that!)

Simon

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ava-1617.patch
Type: application/octet-stream
Size: 626 bytes
Desc: ava-1617.patch
Url : http://lists.linbit.com/pipermail/drbd-dev/attachments/20070125/09ab6873/ava-1617.obj