Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Dec 23, 2008 at 09:31:31AM -0500, Roger Tsang wrote: > Hi, > > I'm using drbd-8.2.7 with "common { disk { on-io-error detach; } }" and see > "drbd3: Local READ failed..." messages even after logs show drbd3 disk state > changed to Diskless. It seems drbd did not detached the local drbd3 disk. > > It is causing load average to increase beyond 40 and the file system stacked on > drbd3 to stall waiting for I/O (unacceptable). > > If not a bug can there be an option to emulate drbd-0.7 behavior to detach > local disk immediately on I/O error? nothing to "emulate" there, as drbd 8 _does_ detach immediately. until proven otherwise I'd say these are comming from requests already submitted (before drbd "detached"), but not yet completed. > Dec 22 10:25:38 node1 kernel: ata2: command 0x25 timeout, stat 0xd0 host_stat > 0x1 > Dec 22 10:25:38 node1 kernel: ata2: status=0xd0 { Busy } > Dec 22 10:25:38 node1 kernel: SCSI error : <1 0 0 0> return code = 0x8000002 > Dec 22 10:25:38 node1 kernel: sdb: Current: sense key: Aborted Command > Dec 22 10:25:38 node1 kernel: Additional sense: Scsi parity error > Dec 22 10:25:38 node1 kernel: end_request: I/O error, dev sdb, sector 4057363 > Dec 22 10:25:38 node1 kernel: drbd3: got an _req_mod() errno of -5 > Dec 22 10:25:38 node1 kernel: drbd3: Local READ failed sec=1952848s size=4096 > Dec 22 10:25:38 node1 kernel: drbd3: disk( UpToDate -> Failed ) > Dec 22 10:25:38 node1 kernel: drbd3: Local IO failed. Detaching... > Dec 22 10:25:38 node1 kernel: ATA: abnormal status 0xD0 on port 0xE007 > Dec 22 10:25:38 node1 last message repeated 2 times > Dec 22 10:25:38 node1 kernel: drbd3: disk( Failed -> Diskless ) > Dec 22 10:25:38 node1 kernel: drbd3: Notified peer that my disk is broken. > ... what does /proc/drbd look like now? > Dec 22 10:33:07 node1 watchdog[68054]: loadavg 37 24 12 is higher than the > given threshold 36 27 18! > Dec 22 10:33:07 node1 watchdog[68054]: shutting down the system because of > error -3 > Dec 22 10:33:08 node1 kernel: ata2: command 0x25 timeout, stat 0xd0 host_stat > 0x1 > Dec 22 10:33:08 node1 kernel: ata2: status=0xd0 { Busy } > Dec 22 10:33:08 node1 kernel: SCSI error : <1 0 0 0> return code = 0x8000002 > Dec 22 10:33:08 node1 kernel: sdb: Current: sense key: Aborted Command > Dec 22 10:33:08 node1 kernel: Additional sense: Scsi parity error > Dec 22 10:33:08 node1 kernel: end_request: I/O error, dev sdb, sector 235310987 > Dec 22 10:33:08 node1 kernel: drbd3: got an _req_mod() errno of -5 > Dec 22 10:33:08 node1 kernel: drbd3: Local READ failed sec=233206472s size=4096 > Dec 22 10:33:08 node1 kernel: ATA: abnormal status 0xD0 on port 0xE007 > Dec 22 10:33:08 node1 last message repeated 2 times > ... > Shutdown/reboot with sync took _very_ long; gets stuck waiting for drbd3! > ... did it finish, or did you need to hard-reset? > Dec 22 11:13:10 node1 kernel: end_request: I/O error, dev sdb, sector 449904539 > Dec 22 11:13:10 node1 kernel: drbd3: got an _req_mod() errno of -5 > Dec 22 11:13:10 node1 kernel: drbd3: Local READ failed sec=447800024s size=4096 > ... > Dec 22 11:14:10 node1 kernel: end_request: I/O error, dev sdb, sector 180695108 > Dec 22 11:14:10 node1 kernel: drbd3: got an _req_mod() errno of -5 > Dec 22 11:14:10 node1 kernel: drbd3: Local WRITE failed sec=178590593s size=512 > ... > Dec 22 11:20:37 node1 syslogd 1.4.1: restart (remote reception). > Dec 22 11:20:37 node1 syslog: syslogd startup succeeded what exactly is your sdb, and what happened to it? > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ > Life on your PC is safer, easier, and more enjoyable with Windows Vista . See > how now, is that so. really. ;) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed