Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
> > When the application cancels i/o requests, the drbd can also cancel > > the pending requests over the net and should, if there are any more > > requests on the local device, cancel them too. > > I don't see it is that easy. > If you see something which I happen to overlook, please provide a patch, > or "prove of concept" or the like. > > > The problem is as long as there are any pending requests, I can't > > write to the fileserver...is this ok?? > > this is not coded in drbd. > I did not observe this either, when I did "detach tests". Yes, when I do a "soft" unplug test with "drbdadm dettach all" it works fine. This is because the hard disk is no really broken and outstanding i/o ops were acked after the disk was dettached. But when I do a "hard" unplug, the i/o ops doesn't get acked (you can see in /proc/drbd under "lo: "), and the drbd hangs in the function drbd_io_error and waits for completion (mdev->local_cnt == 0). As long as drbd_free_ll_dev is not called, it seems that drbd cannot write on the device any more. > > I think drbd can operate in diskless state, so why don't use this > > feature? > > we use it. > > some of the relevant code is in drbd_io_error() in drbd_main.c, and as I > read the code, there should be no more than one second between the two > messages WARN("Notified peer that my disk is broken.\n"); > and either WARN("Not releasing backing storage device.\n"); > or WARN("Releasing backing storage device.\n"); > unless this is time spent in drbd_md_write(mdev)... > and I won't throw away this call. it should not block anything, either. > but maybe we can at least add some "fail fast do not retry" flag here, > if the lower level drivers support it. Yes, drbd_md_write waits a long time, this is because it adds a new i/o write operation which is attached to the queue, due to the timeouts of the previous i/os these takes some minutes until this requests got the timeout answer. Now I have reconfigured my drbd to use another disk for the meta file, drbd doesn't hang and these two messages comes the same second. > it likely is not drbd who blocks further IO, but the file system (which > waits for some journal write to complete). > > imho, the solution really is to make the lower level driver > fail requests faster. Would be the best thing. > I may be wrong of course, maybe there is a simple solution on the > drbd layer, even if it is not our fault. but currently I don't see it... > David Engraf ____________ Virus checked by G DATA AntiVirusKit Version: AVK 16.2371 from 23.12.2005