[DRBD-user] OT: Disk errors

Matthias Weigel matthias.weigel at maweos.de
Mon May 31 11:40:29 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

One of my servers locks up with these disk errors.
It still answers pings, but cannot use its disks.
I suspected this is a hardware error. But IBM service changed controller 
and SAS backplane and this error still persists.
Sometimes it happens during the kernels initrd script, so probably not 
DRBD related. But maybe somebody on this list can give me a hint for 
what to look.

OS is Redhat 5 x86_64, Kernel 2.6.18-164.6.1.el5,
Hardware is IBM x3250,
Controller:  LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT 
SAS (rev 02)
2 Disks:  IBM-ESXS Model: ST3300655SS      Rev: BA26


Thanks in Advance

Matthias


May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in 
FAULT state (2000h)!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - Issuing 
HardReset from mpt_fault_reset_work!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in 
FAULT state!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - 
   FAULT code = 2000h
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff810037fdf080, mf = 
ffff81007ed09700, idx=de
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff810037fdf200, mf = 
ffff81007ed09780, idx=df
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5e40, mf = 
ffff81007ed09800, idx=e0
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5cc0, mf = 
ffff81007ed09880, idx=e1
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5b40, mf = 
ffff81007ed09900, idx=e2
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a5840, mf = 
ffff81007ed09a00, idx=e4
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007f3a56c0, mf = 
ffff81007ed09a80, idx=e5

[ lots of more such messages ]

May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Recovered from IOC 
FAULT
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - 
mpt_fault_reset_work: HardReset: success
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: 
LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200)
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in 
FAULT state (2000h)!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - Issuing 
HardReset from mpt_fault_reset_work!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in 
FAULT state!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - 
   FAULT code = 2000h
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007ea18200, mf = 
ffff81007ed0ea80, idx=185
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007ea1b380, mf = 
ffff81007ed0eb00, idx=186
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Recovered from IOC 
FAULT
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - 
mpt_fault_reset_work: HardReset: success
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in 
FAULT state (2000h)!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - Issuing 
HardReset from mpt_fault_reset_work!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - IOC is in 
FAULT state!!!
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - 
   FAULT code = 2000h
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007d659380, mf = 
ffff81007ed02880, idx=1
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007d659680, mf = 
ffff81007ed02900, idx=2
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa39c0, mf = 
ffff81007ed02980, idx=3
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa3cc0, mf = 
ffff81007ed02a00, idx=4

[ ... ]

May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa36c0, mf = 
ffff81007ed03700, idx=1e
May 31 09:10:45 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007eaa3e40, mf = 
ffff81007ed12700, idx=1fe
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: Recovered from IOC 
FAULT
May 31 09:10:45 xxx.yyy.113.1 kernel: mptbase: ioc0: WARNING - 
mpt_fault_reset_work: HardReset: success
May 31 09:10:47 xxx.yyy.113.1 kernel: mptbase: ioc0: 
LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200)
May 31 09:10:47 xxx.yyy.113.1 kernel: mptbase: ioc0: 
LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200)
May 31 09:11:15 xxx.yyy.113.1 kernel: mptbase: ioc0: 
LogInfo(0x31120200): Originator={PL}, Code={Abort}, SubCode(0x0200)
May 31 09:12:18 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting task 
abort! (sc=ffff81007ea1b380)
May 31 09:12:28 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:12:28 xxx.yyy.113.1 kernel: sd 0:0:1:0: mptscsih: ioc0: 
completing cmds: fw_channel 0, fw_id 2, sc=ffff81007ea1b380, mf = 
ffff81007ed0e380, idx=177
May 31 09:12:46 xxx.yyy.113.1 kernel: mptscsih: ioc0: Issue of TaskMgmt 
failed!
May 31 09:12:46 xxx.yyy.113.1 kernel: mptscsih: ioc0: task abort: FAILED 
(sc=ffff81007ea1b380)
May 31 09:12:46 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting target 
reset! (sc=ffff81007ea1b380)
May 31 09:12:56 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:13:13 xxx.yyy.113.1 kernel: mptscsih: ioc0: Issue of TaskMgmt 
failed!
May 31 09:13:13 xxx.yyy.113.1 kernel: mptscsih: ioc0: target reset: 
FAILED (sc=ffff81007ea1b380)
May 31 09:13:13 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting bus 
reset! (sc=ffff81007ea1b380)
May 31 09:13:23 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:13:44 xxx.yyy.113.1 kernel: mptscsih: ioc0: Issue of TaskMgmt 
failed!
May 31 09:13:44 xxx.yyy.113.1 kernel: mptscsih: ioc0: bus reset: FAILED 
(sc=ffff81007ea1b380)
May 31 09:13:44 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting host 
reset! (sc=ffff81007ea1b380)
May 31 09:13:44 xxx.yyy.113.1 kernel: mptbase: ioc0: Initiating recovery
May 31 09:14:05 xxx.yyy.113.1 kernel: mptscsih: ioc0: host reset: 
SUCCESS (sc=ffff81007ea1b380)
May 31 09:40:58 xxx.yyy.113.1 kernel: mptscsih: ioc0: attempting task 
abort! (sc=ffff810037fdf200)





More information about the drbd-user mailing list