Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
When I do "smartctl -a /dev/sda" on this system, it usually triggers errors in the system log. In addition, self-tests never seem to complete, or the log is misleading. The platform is CentOS 5.5. running on a Zotac nForce 610i Mini-ITX with a Core2 E5400, with an OCZ Vertex2 SSD. No problems in syslog when smartd was started: Nov 19 13:12:19 harry smartd[26558]: smartd 5.41 2010-11-15 r3208 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Nov 19 13:12:19 harry smartd[26558]: Opened configuration file /etc/smartd.conf Nov 19 13:12:19 harry smartd[26558]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices Nov 19 13:12:19 harry smartd[26558]: Device: /dev/sda, type changed from 'scsi' to 'sat' Nov 19 13:12:19 harry smartd[26558]: Device: /dev/sda [SAT], opened Nov 19 13:12:19 harry smartd[26558]: Device: /dev/sda [SAT], found in smartd database. Nov 19 13:12:19 harry smartd[26558]: Device: /dev/sda [SAT], can't monitor Current Pending Sector count - no Attribute 197 Nov 19 13:12:19 harry smartd[26558]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list. Nov 19 13:12:19 harry smartd[26558]: Monitoring 1 ATA and 0 SCSI devices Nov 19 13:12:19 harry smartd[26560]: smartd has fork()ed into background mode. New PID=26560. But when I ran smartctl -a /dev/sda, syslog showed: Nov 19 13:13:12 harry kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Nov 19 13:13:12 harry kernel: ata1.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Nov 19 13:13:12 harry kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 19 13:13:12 harry kernel: ata1.00: status: { DRDY } Nov 19 13:13:12 harry kernel: ata1: hard resetting link Nov 19 13:13:12 harry kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Nov 19 13:13:12 harry kernel: ata1.00: configured for UDMA/133 Nov 19 13:13:12 harry kernel: sd 0:0:0:0: timing out command, waited 20s Nov 19 13:13:12 harry kernel: ata1: EH complete Nov 19 13:13:12 harry kernel: SCSI device sda: 117231408 512-byte hdwr sectors (60022 MB) Nov 19 13:13:12 harry kernel: sda: Write Protect is off Nov 19 13:13:12 harry kernel: SCSI device sda: drive cache: write back I do not see any disk errors like this in any other use of the system. There was a long pause at the end of the smartctl console output before it returned to the command prompt. Here's the output: [root at harry smartmontools]# smartctl -a /dev/sda smartctl 5.41 2010-11-15 r3208 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: SandForce Driven SSDs Device Model: OCZ-VERTEX2 Serial Number: OCZ-8F31137N5HO99D5O Firmware Version: 1.24 User Capacity: 60,022,480,896 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is: Fri Nov 19 13:16:05 2010 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x7f) SMART execute Offline immediate. Auto Offline data collection on/off support. Abort Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 48) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 108 099 050 Pre-fail Always - 0/17417935 5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0 9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always - 180h+48m+28.000s 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 58 171 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0 172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 28 177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 0 181 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0 182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 194 Temperature_Celsius 0x0022 001 129 000 Old_age Always - 1 (0 127 0 129) 195 ECC_Uncorr_Error_Count 0x001c 108 099 000 Old_age Offline - 0/17417935 196 Reallocated_Event_Count 0x0033 100 100 000 Pre-fail Always - 0 231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0 233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 384 234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 2816 241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 2816 242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 64 Error SMART Error Log Read failed: Input/output error Smartctl: SMART Error Log Read Failed Error SMART Error Self-Test Log Read failed: Input/output error Smartctl: SMART Self Test Log Read Failed SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. I then tried to do self-tests on the drive - first a short test, then later a long test. Strangely, doing smartctl -a again, this time there are no syslog errors or delay! Yet some days later, doing smartctl -a does again trigger the errors and delay. There is something odd with the self-test status - long after the tests should have completed, and would seeme to be complete from first part of -a output: Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. The extended self test info shows (used extended because the regular -a output didn't show anything): [root at harry smartmontools]# smartctl -l xselftest /dev/sda smartctl 5.41 2010-11-15 r3208 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === General Purpose Logging (GPL) feature set supported SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Self-test routine in progress 90% 181 - # 2 Short offline Self-test routine in progress 90% 181 - Why the bogus in-progress/remaining info? If you need me to gather more info, let me know.