Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, One of my servers locks up with these disk errors. It still answers pings, but cannot use its disks. I suspected this is a hardware error. But IBM service changed controller and SAS backplane and this error still persists. Sometimes it happens during the kernels initrd script, so probably not DRBD related. But maybe somebody on this list can give me a hint for what to look. OS is Redhat 5 x86_64, Kernel 2.6.18-164.6.1.el5, Hardware is IBM x3250, Controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02) 2 Disks: IBM-ESXS Model: ST3300655SS Rev: BA26 Thanks in Advance Matthias May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in FAULT state (2000h)!!! May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - Issuing HardReset from mpt_fault_reset_work!! May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!! May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - FAULT code = 2000h May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff810037fdf080, mf = ffff81007ed09700, idx=de May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff810037fdf200, mf = ffff81007ed09780, idx=df May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5e40, mf = ffff81007ed09800, idx=e0 May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5cc0, mf = ffff81007ed09880, idx=e1 May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5b40, mf = ffff81007ed09900, idx=e2 May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5840, mf = ffff81007ed09a00, idx=e4 May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a56c0, mf = ffff81007ed09a80, idx=e5 [ lots of more such messages ] May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Recovered from IOC FAULT May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - mpt_fault_reset_work: HardReset: success May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200) May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in FAULT state (2000h)!!! May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - Issuing HardReset from mpt_fault_reset_work!! May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!! May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - FAULT code = 2000h May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007ea18200, mf = ffff81007ed0ea80, idx=185 May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007ea1b380, mf = ffff81007ed0eb00, idx=186 May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Recovered from IOC FAULT May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - mpt_fault_reset_work: HardReset: success May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in FAULT state (2000h)!!! May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - Issuing HardReset from mpt_fault_reset_work!! May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!! May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - FAULT code = 2000h May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007d659380, mf = ffff81007ed02880, idx=1 May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007d659680, mf = ffff81007ed02900, idx=2 May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa39c0, mf = ffff81007ed02980, idx=3 May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa3cc0, mf = ffff81007ed02a00, idx=4 [ ... ] May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa36c0, mf = ffff81007ed03700, idx=1e May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa3e40, mf = ffff81007ed12700, idx=1fe May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Recovered from IOC FAULT May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - mpt_fault_reset_work: HardReset: success May 31 09:10:47 xxx.yyy.113.1 kernel: mptbase: ioc0: LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200) May 31 09:10:47 xxx.yyy.113.1 kernel: mptbase: ioc0: LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200) May 31 09:11:15 xxx.yyy.113.1 kernel: mptbase: ioc0: LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200) May 31 09:12:18 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007ea1b380) May 31 09:12:28 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery May 31 09:12:28 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 2, sc=ffff81007ea1b380, mf = ffff81007ed0e380, idx=177 May 31 09:12:46 xxx.yyy.113.1 kernel: mptscsih: ioc0: Issue of TaskMgmt failed! May 31 09:12:46 xxx.yyy.113.1 kernel: mptscsih: ioc0: task abort: FAILED (sc=ffff81007ea1b380) May 31 09:12:46 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007ea1b380) May 31 09:12:56 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery May 31 09:13:13 xxx.yyy.113.1 kernel: mptscsih: ioc0: Issue of TaskMgmt failed! May 31 09:13:13 xxx.yyy.113.1 kernel: mptscsih: ioc0: target reset: FAILED (sc=ffff81007ea1b380) May 31 09:13:13 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81007ea1b380) May 31 09:13:23 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery May 31 09:13:44 xxx.yyy.113.1 kernel: mptscsih: ioc0: Issue of TaskMgmt failed! May 31 09:13:44 xxx.yyy.113.1 kernel: mptscsih: ioc0: bus reset: FAILED (sc=ffff81007ea1b380) May 31 09:13:44 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting host reset! (sc=ffff81007ea1b380) May 31 09:13:44 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery May 31 09:14:05 xxx.yyy.113.1 kernel: mptscsih: ioc0: host reset: SUCCESS (sc=ffff81007ea1b380) May 31 09:40:58 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff810037fdf200)