[DRBD-user] dettach on io-error failed on pending i/o ops

Fri Dec 23 11:01:36 CET 2005

/ 2005-12-23 10:07:26 +0100
\ David Engraf:
> I have a fileserver with drbd in primary mode and a second computer in
> secondary mode. Now I plug off my hard disk from the primary and want drbd
> to detach the hd and go to diskless mode.
> Everything works fine if there are no pending i/o operations, but when there
> are pending i/o ops drbd waits until ALL operations are finished. The
> problem is that these operations are running in timeout, so the drbd waits
> about a few minutes until all are finished.
> While drbd is waiting I can read from the fileserver but not write.
> I think drbd_io_error in drbd_main.c should cancel all pendig i/o requests
> and give the device immediately free.

we cannot possibly cancel requests that we submitted to the io stack
below us. what if we cancel them, and then later the driver below us
cancels them, too, but they no longer exist??

fix the lower level driver to fail these requests faster.

> Also in this function you should only
> call drbd_md_write(mdev); when the meta device != backup device otherwise
> this function generates also more i/o ops.

since the meta data flags are important,
we at least want to try to write them.

but yes, there is room for optimization.

> Cheers
> David Engraf
> 
> 
> Syslog:
> //Timeout from SATA Harddisk
> Dec 23 17:03:44 node2 kernel: ata1: command 0x35 timeout, stat 0xd0
> host_stat 0x21
> Dec 23 17:03:44 node2 kernel: ata1: translated ATA stat/err 0xd0/00 to SCSI
> SK/ASC/ASCQ 0xb/47/00
> Dec 23 17:03:44 node2 kernel: ata1: status=0xd0 { Busy }
> Dec 23 17:03:44 node2 kernel: sd 0:0:0:0: SCSI error: return code =
> 0x8000002
> Dec 23 17:03:44 node2 kernel: sda: Current: sense key: Aborted Command
> Dec 23 17:03:44 node2 kernel:     Additional sense: Scsi parity error
> Dec 23 17:03:44 node2 kernel: end_request: I/O error, dev sda, sector 37207
> 
> Dec 23 17:03:44 node2 kernel: drbd0: Local IO failed. Detaching...
> //drbd_io_error
> Dec 23 17:03:44 node2 kernel: drbd0: Notified peer that my disk is broken.
> 
> .... some minutes later (when all i/o ops are finished)
> 
> Dec 23 17:10:44 node2 kernel: drbd0: Releasing backing storage device.
> 
> -> after this message you can write again on the fileserver

so what.
be happy that this is possible at all.
If you had no drbd, the box probably had paniced
or otherwise completly screwed up.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.