AW: [DRBD-user] dettach on io-error failed on pending i/o ops

Fri Dec 23 11:47:23 CET 2005

> Von: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-
> bounces at lists.linbit.com] Im Auftrag von Lars Ellenberg
> Gesendet: Freitag, 23. Dezember 2005 11:02
> An: drbd-user at lists.linbit.com
> Betreff: Re: [DRBD-user] dettach on io-error failed on pending i/o ops
> 
> / 2005-12-23 10:07:26 +0100
> \ David Engraf:
> > I have a fileserver with drbd in primary mode and a second computer in
> > secondary mode. Now I plug off my hard disk from the primary and want
> drbd
> > to detach the hd and go to diskless mode.
> > Everything works fine if there are no pending i/o operations, but when
> there
> > are pending i/o ops drbd waits until ALL operations are finished. The
> > problem is that these operations are running in timeout, so the drbd
> waits
> > about a few minutes until all are finished.
> > While drbd is waiting I can read from the fileserver but not write.
> > I think drbd_io_error in drbd_main.c should cancel all pendig i/o
> requests
> > and give the device immediately free.
> 
> we cannot possibly cancel requests that we submitted to the io stack
> below us. what if we cancel them, and then later the driver below us
> cancels them, too, but they no longer exist??
> 
> fix the lower level driver to fail these requests faster.
>

When the application cancels i/o requests, the drbd can also cancel the
pending requests over the net and should, if there are any more requests on
the local device, cancel them too. The problem is as long as there are any
pending requests, I can't write to the fileserver...is this ok?? I think
drbd can operate in diskless state, so why don't use this feature?

> > Also in this function you should only
> > call drbd_md_write(mdev); when the meta device != backup device
> otherwise
> > this function generates also more i/o ops.
> 
> since the meta data flags are important,
> we at least want to try to write them.
> 
> but yes, there is room for optimization.
> 
> > Cheers
> > David Engraf
> >
> >
> > Syslog:
> > //Timeout from SATA Harddisk
> > Dec 23 17:03:44 node2 kernel: ata1: command 0x35 timeout, stat 0xd0
> > host_stat 0x21
> > Dec 23 17:03:44 node2 kernel: ata1: translated ATA stat/err 0xd0/00 to
> SCSI
> > SK/ASC/ASCQ 0xb/47/00
> > Dec 23 17:03:44 node2 kernel: ata1: status=0xd0 { Busy }
> > Dec 23 17:03:44 node2 kernel: sd 0:0:0:0: SCSI error: return code =
> > 0x8000002
> > Dec 23 17:03:44 node2 kernel: sda: Current: sense key: Aborted Command
> > Dec 23 17:03:44 node2 kernel:     Additional sense: Scsi parity error
> > Dec 23 17:03:44 node2 kernel: end_request: I/O error, dev sda, sector
> 37207
> >
> > Dec 23 17:03:44 node2 kernel: drbd0: Local IO failed. Detaching...
> > //drbd_io_error
> > Dec 23 17:03:44 node2 kernel: drbd0: Notified peer that my disk is
> broken.
> >
> > .... some minutes later (when all i/o ops are finished)
> >
> > Dec 23 17:10:44 node2 kernel: drbd0: Releasing backing storage device.
> >
> > -> after this message you can write again on the fileserver
> 
> so what.
> be happy that this is possible at all.
> If you had no drbd, the box probably had paniced
> or otherwise completly screwed up.

Yes I'm happy but that's why I use drbd.

David Engraf

____________
Virus checked by G DATA AntiVirusKit
Version: AVK 16.2367 from 23.12.2005