[DRBD-user] dettach on io-error failed on pending i/o ops

David Engraf engraf.david at netcom-sicherheitstechnik.de
Fri Dec 23 10:07:26 CET 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

I have a fileserver with drbd in primary mode and a second computer in
secondary mode. Now I plug off my hard disk from the primary and want drbd
to detach the hd and go to diskless mode.
Everything works fine if there are no pending i/o operations, but when there
are pending i/o ops drbd waits until ALL operations are finished. The
problem is that these operations are running in timeout, so the drbd waits
about a few minutes until all are finished.
While drbd is waiting I can read from the fileserver but not write.
I think drbd_io_error in drbd_main.c should cancel all pendig i/o requests
and give the device immediately free. Also in this function you should only
call drbd_md_write(mdev); when the meta device != backup device otherwise
this function generates also more i/o ops.

David Engraf

//Timeout from SATA Harddisk
Dec 23 17:03:44 node2 kernel: ata1: command 0x35 timeout, stat 0xd0
host_stat 0x21
Dec 23 17:03:44 node2 kernel: ata1: translated ATA stat/err 0xd0/00 to SCSI
SK/ASC/ASCQ 0xb/47/00
Dec 23 17:03:44 node2 kernel: ata1: status=0xd0 { Busy }
Dec 23 17:03:44 node2 kernel: sd 0:0:0:0: SCSI error: return code =
Dec 23 17:03:44 node2 kernel: sda: Current: sense key: Aborted Command
Dec 23 17:03:44 node2 kernel:     Additional sense: Scsi parity error
Dec 23 17:03:44 node2 kernel: end_request: I/O error, dev sda, sector 37207

Dec 23 17:03:44 node2 kernel: drbd0: Local IO failed. Detaching...
Dec 23 17:03:44 node2 kernel: drbd0: Notified peer that my disk is broken.

.... some minutes later (when all i/o ops are finished)

Dec 23 17:10:44 node2 kernel: drbd0: Releasing backing storage device.

-> after this message you can write again on the fileserver

cat /proc/drbd

version: 0.7.14 (api:77/proto:74)
SVN Revision: 1989 build by root at KGOStestserver1, 2005-12-02 15:59:00
 0: cs:DiskLessClient st:Primary/Secondary ld:Inconsistent
    ns:881972 nr:0 dw:25080 dr:856920 al:35 bm:194 lo:6256 pe:0 ua:0 ap:6256
 1: cs:Unconfigured
 2: cs:Unconfigured
 3: cs:Unconfigured
 4: cs:Unconfigured
 5: cs:Unconfigured
 6: cs:Unconfigured
 7: cs:Unconfigured

-> you can se there are about 6256 i/o ops pending, it takes a very long
time after all requests runs in a timeout error.

Virus checked by G DATA AntiVirusKit
Version: AVK 16.2365 from 23.12.2005

More information about the drbd-user mailing list