Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2005-10-06 17:20:18 +0900 \ HIROSE, Masaaki: > Hello, > > drbd_worker.c in drbd-0.7.13: > > 589 int w_e_end_rsdata_req(drbd_dev *mdev, struct drbd_work *w, int cancel) > (snip) > 613 } else { > * 614 ok=drbd_send_ack(mdev,NegRSDReply,e); > 615 if (DRBD_ratelimit(5*HZ,5)) > * 616 ERR("Sending NegDReply. I guess it gets messy.\n"); > 617 drbd_io_error(mdev); > 618 } > > drbd_send_ack() send NegRSDReply but ERR say NegDReply. which is correct? well, NegRSDReply. code is correct, error message has copy'n'paste error. > disk error occurred on primary drbd server (on-io-error: detach). > primary server say: > > kernel: drbd1: Local IO failed. Detaching... > kernel: drbd1: Sending NegDReply. I guess it gets messy. > kernel: drbd1: Notified peer that my disk is broken. > > and secodary got NegRSDReply and say: > > kernel: drbd1: Got NegRSDReply. WE ARE LOST. We lost our up-to-date disk. > > and secodary server do drbd_panic. > > Why drbd do kernel panic on secodary(no error), instead of > primary(disk error)? because you probably configured "on-io-error = detach", and the primary had still the hope it could continue as diskless client? anyways. these corner cases are not really good tested, because it does not matter that much what exactly you do: you have lost the last good copy of the data, you are screwed, and you just cannot recover anyhow. maybe the resync target should not panic in this particular situation, but allow for the operator to try and rescue something else. -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.