Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello,
One of my servers locks up with these disk errors.
It still answers pings, but cannot use its disks.
I suspected this is a hardware error. But IBM service changed controller
and SAS backplane and this error still persists.
Sometimes it happens during the kernels initrd script, so probably not
DRBD related. But maybe somebody on this list can give me a hint for
what to look.
OS is Redhat 5 x86_64, Kernel 2.6.18-164.6.1.el5,
Hardware is IBM x3250,
Controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT
SAS (rev 02)
2 Disks: IBM-ESXS Model: ST3300655SS Rev: BA26
Thanks in Advance
Matthias
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in
FAULT state (2000h)!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - Issuing
HardReset from mpt_fault_reset_work!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in
FAULT state!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING -
FAULT code = 2000h
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff810037fdf080, mf =
ffff81007ed09700, idx=de
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff810037fdf200, mf =
ffff81007ed09780, idx=df
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5e40, mf =
ffff81007ed09800, idx=e0
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5cc0, mf =
ffff81007ed09880, idx=e1
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5b40, mf =
ffff81007ed09900, idx=e2
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5840, mf =
ffff81007ed09a00, idx=e4
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a56c0, mf =
ffff81007ed09a80, idx=e5
[ lots of more such messages ]
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Recovered from IOC
FAULT
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING -
mpt_fault_reset_work: HardReset: success
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0:
LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200)
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in
FAULT state (2000h)!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - Issuing
HardReset from mpt_fault_reset_work!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in
FAULT state!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING -
FAULT code = 2000h
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007ea18200, mf =
ffff81007ed0ea80, idx=185
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007ea1b380, mf =
ffff81007ed0eb00, idx=186
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Recovered from IOC
FAULT
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING -
mpt_fault_reset_work: HardReset: success
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in
FAULT state (2000h)!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - Issuing
HardReset from mpt_fault_reset_work!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in
FAULT state!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING -
FAULT code = 2000h
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007d659380, mf =
ffff81007ed02880, idx=1
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007d659680, mf =
ffff81007ed02900, idx=2
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa39c0, mf =
ffff81007ed02980, idx=3
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa3cc0, mf =
ffff81007ed02a00, idx=4
[ ... ]
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa36c0, mf =
ffff81007ed03700, idx=1e
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa3e40, mf =
ffff81007ed12700, idx=1fe
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Recovered from IOC
FAULT
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING -
mpt_fault_reset_work: HardReset: success
May 31 09:10:47 xxx.yyy.113.1 kernel: mptbase: ioc0:
LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200)
May 31 09:10:47 xxx.yyy.113.1 kernel: mptbase: ioc0:
LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200)
May 31 09:11:15 xxx.yyy.113.1 kernel: mptbase: ioc0:
LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200)
May 31 09:12:18 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting task
abort! (sc=ffff81007ea1b380)
May 31 09:12:28 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:12:28 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0:
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007ea1b380, mf =
ffff81007ed0e380, idx=177
May 31 09:12:46 xxx.yyy.113.1 kernel: mptscsih: ioc0: Issue of TaskMgmt
failed!
May 31 09:12:46 xxx.yyy.113.1 kernel: mptscsih: ioc0: task abort: FAILED
(sc=ffff81007ea1b380)
May 31 09:12:46 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting target
reset! (sc=ffff81007ea1b380)
May 31 09:12:56 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:13:13 xxx.yyy.113.1 kernel: mptscsih: ioc0: Issue of TaskMgmt
failed!
May 31 09:13:13 xxx.yyy.113.1 kernel: mptscsih: ioc0: target reset:
FAILED (sc=ffff81007ea1b380)
May 31 09:13:13 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting bus
reset! (sc=ffff81007ea1b380)
May 31 09:13:23 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:13:44 xxx.yyy.113.1 kernel: mptscsih: ioc0: Issue of TaskMgmt
failed!
May 31 09:13:44 xxx.yyy.113.1 kernel: mptscsih: ioc0: bus reset: FAILED
(sc=ffff81007ea1b380)
May 31 09:13:44 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting host
reset! (sc=ffff81007ea1b380)
May 31 09:13:44 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:14:05 xxx.yyy.113.1 kernel: mptscsih: ioc0: host reset:
SUCCESS (sc=ffff81007ea1b380)
May 31 09:40:58 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting task
abort! (sc=ffff810037fdf200)