Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all,
I have new information.
*******************************************************************************************************************
1. I found in logs on host1 this:
...
Apr 6 19:14:36 host1 Server Administrator: Storage Service EventID:
2405 Command timeout on physical disk: Physical Disk 0:0:2
Controller 0, Connector 0
Apr 6 19:14:37 host1 Server Administrator: Storage Service EventID:
2405 Command timeout on physical disk: Physical Disk 0:0:2
Controller 0, Connector 0
Apr 6 19:14:37 host1 Server Administrator: Storage Service EventID:
2405 Command timeout on physical disk: Physical Disk 0:0:2
Controller 0, Connector 0
Apr 6 19:14:37 host1 Server Administrator: Storage Service EventID:
2095 Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29
Sense qualifier: 0: Physical Disk 0:0:2 Controller 0, Connector 0
Apr 6 19:14:37 host1 Server Administrator: Storage Service EventID:
2095 Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29
Sense qualifier: 0: Physical Disk 0:0:2 Controller 0, Connector 0
...
2. Then I checked physical drives with MegaCli64 -PDList -aALL|egrep
-i 'error' and it found some errors:
host1
Media Error Count: 0
Other Error Count: 4
Media Error Count: 0
Other Error Count: 6
Media Error Count: 0
Other Error Count: 10
Media Error Count: 0
Other Error Count: 4
Media Error Count: 0
Other Error Count: 0
host2
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
Media Error Count: 0
Other Error Count: 0
3. I checked firmware versions with MegaCli64 -PDList -aALL|egrep -i
'firmware level|inquiry data':
host1
Device Firmware Level: 0001
Inquiry Data: S2B7J90Z909775SAMSUNG HE103SJ
1AJ30001
Device Firmware Level: 0001
Inquiry Data: S2B7J90Z908036SAMSUNG HE103SJ
1AJ30001
Device Firmware Level: 0001
Inquiry Data: S2B7J90Z908039SAMSUNG HE103SJ
1AJ30001
Device Firmware Level: 0001
Inquiry Data: S2B7J90Z910558SAMSUNG HE103SJ
1AJ30001
Device Firmware Level: 0001
Inquiry Data: S2B7J90Z909773SAMSUNG HE103SJ
1AJ30001Hotspare Information:
host2
Device Firmware Level: 1V02
Inquiry Data: WD-WCAW32135912WDC WD1003FBYX-18Y7B0
01.01V02
Device Firmware Level: 1V02
Inquiry Data: WD-WCAW32133592WDC WD1003FBYX-18Y7B0
01.01V02
Device Firmware Level: 1V02
Inquiry Data: WD-WCAW32105584WDC WD1003FBYX-18Y7B0
01.01V02
Device Firmware Level: 1V02
Inquiry Data: WD-WCAW32121292WDC WD1003FBYX-18Y7B0
01.01V02
Device Firmware Level: 1V02
Inquiry Data: WD-WCAW32128662WDC WD1003FBYX-18Y7B0
01.01V02Hotspare Information:
*******************************************************************************************************************
I found here https://en.wikipedia.org/wiki/Key_Code_Qualifier that
"Sense key: 6 Sense code: 29 Sense qualifier: 0:" means "Unit
Attention - POR or device reset occurred".
Then I found on the dell forum
(http://en.community.dell.com/support-forums/servers/f/906/t/19426714.aspx)
the information that resetting hard drive is okay under some
circumstances (for example under heavy load) but I wonder if this is
not true for DRBD and this CAN cause DRBD issues?
Best regards,
Stanislav