Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
When I do "smartctl -a /dev/sda" on this system, it usually
triggers errors in the system log.
In addition, self-tests never seem to complete, or the log is
misleading.
The platform is CentOS 5.5. running on a Zotac nForce 610i
Mini-ITX with a Core2 E5400, with an OCZ Vertex2 SSD.
No problems in syslog when smartd was started:
Nov 19 13:12:19 harry smartd[26558]: smartd 5.41 2010-11-15 r3208 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
Nov 19 13:12:19 harry smartd[26558]: Opened configuration file /etc/smartd.conf
Nov 19 13:12:19 harry smartd[26558]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Nov 19 13:12:19 harry smartd[26558]: Device: /dev/sda, type changed from 'scsi' to 'sat'
Nov 19 13:12:19 harry smartd[26558]: Device: /dev/sda [SAT], opened
Nov 19 13:12:19 harry smartd[26558]: Device: /dev/sda [SAT], found in smartd database.
Nov 19 13:12:19 harry smartd[26558]: Device: /dev/sda [SAT], can't monitor Current Pending Sector count - no Attribute 197
Nov 19 13:12:19 harry smartd[26558]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Nov 19 13:12:19 harry smartd[26558]: Monitoring 1 ATA and 0 SCSI devices
Nov 19 13:12:19 harry smartd[26560]: smartd has fork()ed into background mode. New PID=26560.
But when I ran smartctl -a /dev/sda, syslog showed:
Nov 19 13:13:12 harry kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Nov 19 13:13:12 harry kernel: ata1.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Nov 19 13:13:12 harry kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov 19 13:13:12 harry kernel: ata1.00: status: { DRDY }
Nov 19 13:13:12 harry kernel: ata1: hard resetting link
Nov 19 13:13:12 harry kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 19 13:13:12 harry kernel: ata1.00: configured for UDMA/133
Nov 19 13:13:12 harry kernel: sd 0:0:0:0: timing out command, waited 20s
Nov 19 13:13:12 harry kernel: ata1: EH complete
Nov 19 13:13:12 harry kernel: SCSI device sda: 117231408 512-byte hdwr sectors (60022 MB)
Nov 19 13:13:12 harry kernel: sda: Write Protect is off
Nov 19 13:13:12 harry kernel: SCSI device sda: drive cache: write back
I do not see any disk errors like this in any other use of the
system.
There was a long pause at the end of the smartctl console output
before it returned to the command prompt. Here's the output:
[root at harry smartmontools]# smartctl -a /dev/sda
smartctl 5.41 2010-11-15 r3208 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: SandForce Driven SSDs
Device Model: OCZ-VERTEX2
Serial Number: OCZ-8F31137N5HO99D5O
Firmware Version: 1.24
User Capacity: 60,022,480,896 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Fri Nov 19 13:16:05 2010 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x7f) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 48) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 108 099 050 Pre-fail Always - 0/17417935
5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0
9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always - 180h+48m+28.000s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 58
171 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 28
177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 0
181 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 001 129 000 Old_age Always - 1 (0 127 0 129)
195 ECC_Uncorr_Error_Count 0x001c 108 099 000 Old_age Offline - 0/17417935
196 Reallocated_Event_Count 0x0033 100 100 000 Pre-fail Always - 0
231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0
233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 384
234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 2816
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 2816
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 64
Error SMART Error Log Read failed: Input/output error
Smartctl: SMART Error Log Read Failed
Error SMART Error Self-Test Log Read failed: Input/output error
Smartctl: SMART Self Test Log Read Failed
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
I then tried to do self-tests on the drive - first a short test,
then later a long test.
Strangely, doing smartctl -a again, this time there are no syslog
errors or delay!
Yet some days later, doing smartctl -a does again trigger the
errors and delay.
There is something odd with the self-test status - long after the
tests should have completed, and would seeme to be complete from
first part of -a output:
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
The extended self test info shows (used extended because the
regular -a output didn't show anything):
[root at harry smartmontools]# smartctl -l xselftest /dev/sda
smartctl 5.41 2010-11-15 r3208 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
General Purpose Logging (GPL) feature set supported
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Self-test routine in progress 90% 181 -
# 2 Short offline Self-test routine in progress 90% 181 -
Why the bogus in-progress/remaining info?
If you need me to gather more info, let me know.