Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have a fileserver with drbd in primary mode and a second computer in secondary mode. Now I plug off my hard disk from the primary and want drbd to detach the hd and go to diskless mode. Everything works fine if there are no pending i/o operations, but when there are pending i/o ops drbd waits until ALL operations are finished. The problem is that these operations are running in timeout, so the drbd waits about a few minutes until all are finished. While drbd is waiting I can read from the fileserver but not write. I think drbd_io_error in drbd_main.c should cancel all pendig i/o requests and give the device immediately free. Also in this function you should only call drbd_md_write(mdev); when the meta device != backup device otherwise this function generates also more i/o ops. Cheers David Engraf Syslog: //Timeout from SATA Harddisk Dec 23 17:03:44 node2 kernel: ata1: command 0x35 timeout, stat 0xd0 host_stat 0x21 Dec 23 17:03:44 node2 kernel: ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Dec 23 17:03:44 node2 kernel: ata1: status=0xd0 { Busy } Dec 23 17:03:44 node2 kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002 Dec 23 17:03:44 node2 kernel: sda: Current: sense key: Aborted Command Dec 23 17:03:44 node2 kernel: Additional sense: Scsi parity error Dec 23 17:03:44 node2 kernel: end_request: I/O error, dev sda, sector 37207 Dec 23 17:03:44 node2 kernel: drbd0: Local IO failed. Detaching... //drbd_io_error Dec 23 17:03:44 node2 kernel: drbd0: Notified peer that my disk is broken. .... some minutes later (when all i/o ops are finished) Dec 23 17:10:44 node2 kernel: drbd0: Releasing backing storage device. -> after this message you can write again on the fileserver cat /proc/drbd version: 0.7.14 (api:77/proto:74) SVN Revision: 1989 build by root at KGOStestserver1, 2005-12-02 15:59:00 0: cs:DiskLessClient st:Primary/Secondary ld:Inconsistent ns:881972 nr:0 dw:25080 dr:856920 al:35 bm:194 lo:6256 pe:0 ua:0 ap:6256 1: cs:Unconfigured 2: cs:Unconfigured 3: cs:Unconfigured 4: cs:Unconfigured 5: cs:Unconfigured 6: cs:Unconfigured 7: cs:Unconfigured -> you can se there are about 6256 i/o ops pending, it takes a very long time after all requests runs in a timeout error. ____________ Virus checked by G DATA AntiVirusKit Version: AVK 16.2365 from 23.12.2005