Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have a fileserver with drbd in primary mode and a second computer in
secondary mode. Now I plug off my hard disk from the primary and want drbd
to detach the hd and go to diskless mode.
Everything works fine if there are no pending i/o operations, but when there
are pending i/o ops drbd waits until ALL operations are finished. The
problem is that these operations are running in timeout, so the drbd waits
about a few minutes until all are finished.
While drbd is waiting I can read from the fileserver but not write.
I think drbd_io_error in drbd_main.c should cancel all pendig i/o requests
and give the device immediately free. Also in this function you should only
call drbd_md_write(mdev); when the meta device != backup device otherwise
this function generates also more i/o ops.
Cheers
David Engraf
Syslog:
//Timeout from SATA Harddisk
Dec 23 17:03:44 node2 kernel: ata1: command 0x35 timeout, stat 0xd0
host_stat 0x21
Dec 23 17:03:44 node2 kernel: ata1: translated ATA stat/err 0xd0/00 to SCSI
SK/ASC/ASCQ 0xb/47/00
Dec 23 17:03:44 node2 kernel: ata1: status=0xd0 { Busy }
Dec 23 17:03:44 node2 kernel: sd 0:0:0:0: SCSI error: return code =
0x8000002
Dec 23 17:03:44 node2 kernel: sda: Current: sense key: Aborted Command
Dec 23 17:03:44 node2 kernel: Additional sense: Scsi parity error
Dec 23 17:03:44 node2 kernel: end_request: I/O error, dev sda, sector 37207
Dec 23 17:03:44 node2 kernel: drbd0: Local IO failed. Detaching...
//drbd_io_error
Dec 23 17:03:44 node2 kernel: drbd0: Notified peer that my disk is broken.
.... some minutes later (when all i/o ops are finished)
Dec 23 17:10:44 node2 kernel: drbd0: Releasing backing storage device.
-> after this message you can write again on the fileserver
cat /proc/drbd
version: 0.7.14 (api:77/proto:74)
SVN Revision: 1989 build by root at KGOStestserver1, 2005-12-02 15:59:00
0: cs:DiskLessClient st:Primary/Secondary ld:Inconsistent
ns:881972 nr:0 dw:25080 dr:856920 al:35 bm:194 lo:6256 pe:0 ua:0 ap:6256
1: cs:Unconfigured
2: cs:Unconfigured
3: cs:Unconfigured
4: cs:Unconfigured
5: cs:Unconfigured
6: cs:Unconfigured
7: cs:Unconfigured
-> you can se there are about 6256 i/o ops pending, it takes a very long
time after all requests runs in a timeout error.
____________
Virus checked by G DATA AntiVirusKit
Version: AVK 16.2365 from 23.12.2005