[DRBD-user] got NegRSDReply on secodary server

Thu Oct 6 13:58:05 CEST 2005

/ 2005-10-06 17:20:18 +0900
\ HIROSE, Masaaki:
> Hello,
> 
> drbd_worker.c in drbd-0.7.13:
> 
>    589	int w_e_end_rsdata_req(drbd_dev *mdev, struct drbd_work *w, int cancel)
> (snip)
>    613		} else {
> *  614			ok=drbd_send_ack(mdev,NegRSDReply,e);
>    615			if (DRBD_ratelimit(5*HZ,5))
> *  616				ERR("Sending NegDReply. I guess it gets messy.\n");
>    617			drbd_io_error(mdev);
>    618		}
> 
> drbd_send_ack() send NegRSDReply but ERR say NegDReply. which is correct?

well, NegRSDReply.
code is correct, error message has copy'n'paste error.

> disk error occurred on primary drbd server (on-io-error: detach).
> primary server say:
> 
>   kernel: drbd1: Local IO failed. Detaching...
>   kernel: drbd1: Sending NegDReply. I guess it gets messy.
>   kernel: drbd1: Notified peer that my disk is broken.
> 
> and secodary got NegRSDReply and say:
> 
>   kernel: drbd1: Got NegRSDReply. WE ARE LOST. We lost our up-to-date disk. 
> 
> and secodary server do drbd_panic.
> 
> Why drbd do kernel panic on secodary(no error), instead of
> primary(disk error)?

because you probably configured "on-io-error = detach", and the primary
had still the hope it could continue as diskless client?

anyways.
these corner cases are not really good tested,
because it does not matter that much what exactly you do:
you have lost the last good copy of the data,
you are screwed, and you just cannot recover anyhow.

maybe the resync target should not panic in this particular situation,
but allow for the operator to try and rescue something else.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.