Announcement

Collapse
No announcement yet.

Fairwell HD, I knew thee well!

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Fairwell HD, I knew thee well!

    Yesterday, while playing Minecraft full screen, my computer suddenly rebooted! As I watched the boot lines scroll by I noticed a brtfs error correction event. That led to a brtfs scrub which recovered two sector unreadable errors. Then came the smartctl short test, which showed:
    Code:
    smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-34-generic] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Toshiba 2.5" HDD MK..59GSXP (AF)
    Device Model:     TOSHIBA MK6459GSXP
    Serial Number:    7162F6U4S
    LU WWN Device Id: 5 000039 3668014e2
    Firmware Version: GN003J
    User Capacity:    640,135,028,736 bytes [640 GB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5400 rpm
    Form Factor:      2.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS (minor revision not indicated)
    SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
    Local Time is:    Sun Aug  7 23:44:51 2016 CDT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00)    Offline data collection activity
                        was never started.
                        Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0)    The previous self-test routine completed
                        without error or no self-test has ever 
                        been run.
    Total time to complete Offline 
    data collection:         (  120) seconds.
    Offline data collection
    capabilities:              (0x5b) SMART execute Offline immediate.
                        Auto Offline data collection on/off support.
                        Suspend Offline collection upon new
                        command.
                        Offline surface scan supported.
                        Self-test supported.
                        No Conveyance Self-test supported.
                        Selective Self-test supported.
    SMART capabilities:            (0x0003)    Saves SMART data before entering
                        power-saving mode.
                        Supports SMART auto save timer.
    Error logging capability:        (0x01)    Error logging supported.
                        General Purpose Logging supported.
    Short self-test routine 
    recommended polling time:      (   2) minutes.
    Extended self-test routine
    recommended polling time:      ( 203) minutes.
    SCT capabilities:            (0x003d)    SCT Status supported.
                        SCT Error Recovery Control supported.
                        SCT Feature Control supported.
                        SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
      2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
      3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       1610
      4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       1898
      5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       24
      7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
      9 Power_On_Hours          0x0032   072   072   000    Old_age   Always       -       11206
     10 Spin_Retry_Count        0x0033   137   100   030    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1665
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       103
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       67
    193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       28783
    194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       40 (Min/Max 12/46)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3
    197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
    220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       73
    222 Loaded_Hours            0x0032   076   076   000    Old_age   Always       -       9779
    223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
    224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
    226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       333
    240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0
    
    SMART Error Log Version: 1
    ATA Error Count: 2
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
    Powered_Up_Time is measured from power on, and printed as
    DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
    SS=sec, and sss=millisec. It "wraps" after 49.710 days.
    
    Error 2 occurred at disk power-on lifetime: 11205 hours (466 days + 21 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 41 d2 e0 45 8a 66  Error: UNC at LBA = 0x068a45e0 = 109725152
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 08 d0 e0 45 8a 40 00      00:30:07.139  READ FPDMA QUEUED
      60 08 c8 d8 45 8a 40 00      00:30:07.139  READ FPDMA QUEUED
      60 08 c0 d0 45 8a 40 00      00:30:07.139  READ FPDMA QUEUED
      60 08 b8 c8 45 8a 40 00      00:30:07.139  READ FPDMA QUEUED
      60 08 b0 c0 45 8a 40 00      00:30:07.138  READ FPDMA QUEUED
    
    Error 1 occurred at disk power-on lifetime: 11205 hours (466 days + 21 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 41 92 e0 45 8a 66  Error: UNC at LBA = 0x068a45e0 = 109725152
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 40 90 00 45 8a 40 00      00:29:55.460  READ FPDMA QUEUED
      60 80 88 60 43 8a 40 00      00:29:55.458  READ FPDMA QUEUED
      60 40 80 00 43 8a 40 00      00:29:55.457  READ FPDMA QUEUED
      60 a0 78 40 42 8a 40 00      00:29:55.457  READ FPDMA QUEUED
      60 60 70 c0 3f 8a 40 00      00:29:55.451  READ FPDMA QUEUED
    
    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Short offline       Completed without error       00%     11206         -
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    which showed that eight of the 25 tests were "pre-failure" and the rest were old age. This laptop was purchased in 2010. I boot it up about eight out of every ten days and leave it on until I shut it down at night. The "start-stop" count corresponds to 5.2 years of ons and offs.

    I'm starting the long test (the GSmartControl app is neat!) and then taking my grandson Bass fishing. Today is my 75th birthday. It looks I'll be buying myself a birthday present, a replacement HD for $75.
    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
    – John F. Kennedy, February 26, 1962.

    #2
    Well then, Happy Birthday!! And so, https://m.youtube.com/watch?v=xxOviBI-8fc
    Using Kubuntu Linux since March 23, 2007
    "It is a capital mistake to theorize before one has data." - Sherlock Holmes

    Comment


      #3
      "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
      – John F. Kennedy, February 26, 1962.

      Comment


        #4
        LOL that was nice @ Snowhog

        HAPPY Birthday GG and wishing you many more !!

        VINNY
        i7 4core HT 8MB L3 2.9GHz
        16GB RAM
        Nvidia GTX 860M 4GB RAM 1152 cuda cores

        Comment


          #5
          Happy Birthday GreyGeek
          Linux User #454271

          Comment


            #6
            Happy Birthday GG

            Originally posted by GreyGeek View Post
            which showed that eight of the 25 tests were "pre-failure" and the rest were old age.
            Those just tell the test type, not the actual result. Brand new and fully functional drives will show exactly the same test types, these just indicate whether "bad" results suggest "an impending failure" or just "old age which increases chances of failure".

            The important columns when interpreting the results are VALUE (current value), WORST (worst value), THRESH ("bad result" threshold)...and the RAW_VALUE (depending on the test).

            Looking at your results, the only pre-fail result with a non-zero RAW_VALUE count is "Reallocated_Sector_Ct" and that number is not necessarily alarming by itself (bad sectors are not that uncommon) unless it starts increasing abnormally. "Spin_Up_Time" also has a non-zero RAW_VALUE, but that just shows the spin up time in milliseconds (not really a "count").

            Not saying the drive isn't failing, increasing bad sector errors are usually bad news, just that you shouldn't throw a disk away just because you see "old age" or "pre-fail" on the test type column.

            (https://en.wikipedia.org/wiki/S.M.A.....T._attributes)
            Last edited by kubicle; Aug 09, 2016, 01:59 AM.

            Comment


              #7
              Happy Birthday, GG, and many more too!

              In my experience, spontaneous reboot is more often caused by a failing power supply or overheated CPU than the hdd. But your hdd is getting a little long in the tooth, for sure. While you have the laptop opened up, give the CPU heat sink and fan a cleaning and that may help as much as a new hdd.

              Comment


                #8
                Belated happy birthday, GG
                we see things not as they are, but as we are.
                -- anais nin

                Comment


                  #9
                  dibl: In my experience, spontaneous reboot is more often caused by a failing power supply or overheated CPU than the hdd. But your hdd is getting a little long in the tooth, for sure. While you have the laptop opened up, give the CPU heat sink and fan a cleaning and that may help as much as a new hdd.
                  Yep, that's my experience, too.

                  Happy B-Day GG! Wishing you many more, to boot! (pun, there, ya know ...)
                  An intellectual says a simple thing in a hard way. An artist says a hard thing in a simple way. Charles Bukowski

                  Comment


                    #10
                    Happy B-Day GG!!

                    Comment


                      #11
                      Originally posted by dibl View Post
                      Happy Birthday, GG, and many more too!

                      In my experience, spontaneous reboot is more often caused by a failing power supply or overheated CPU than the hdd. But your hdd is getting a little long in the tooth, for sure. While you have the laptop opened up, give the CPU heat sink and fan a cleaning and that may help as much as a new hdd.
                      Thanks for the HB wishes folks! When I was a lot younger I never thought I'd get to this age, considering my "close encounters" of the past. (stupid driving techniques and stunts, events while a LEO, cockpit fire when flying Cessna 172, hydraulic failures on a Boeing 727 leading to a high speed emergency landing -- did you know it has no hydraulic redundancy?, and many more ...)

                      An overheated GPU can blank the screen or trigger a reboot as well. That and all the things you folks have mentioned I have experienced at one time or another. I keep cans of air by my computer and regularly blow it out. I also have it setting on a cooling fan and I have a CPU temp widget in the panel. My CPU has a max T of 95C. The highest the temp monitor has shown, that I happened to notice for it, was 74C.

                      My laptop was purchased in 2010. About a year ago I had a sudden reboot episode (don't remember what I was doing) and that prompted me to install the CPU temp monitor because I suspected that it was caused by the CPU getting too hot and never looked for any other cause. As I write this my CPU temp is idling around 39C. My smartctl long test yesterday didn't add anything new. Just to be safe I ordered a replacement drive to have on hand encase of failure. And, I am going to use btrfs to send a copy of my @ and @home to another HD.

                      Oh, Minecraft 1.10.2 recovered without a loss of anything. It must have happened right after an automatic save (which may have caused it?), so I didn't lose the 17 diamonds I found while mining at level 5.
                      "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                      – John F. Kennedy, February 26, 1962.

                      Comment


                        #12
                        so I didn't lose the 17 diamonds I found
                        Yeah, my youngest and I were doing good and had found and took several of those but to get trapped in an area with an overactive zombie spawner. Couldn't fight off enough and lost them all!! I know how that would feel to loose those!

                        Comment


                          #13
                          Happy birthday.
                          I missed this thread. My main computer is also about five years old and very cheap. Every month I check the harddisk, and as long as I can remember I get that pre-fail and old age. But the disk is running fine. Just like me, also pre-fail and old-age but still running )
                          A few years ago I had a search for pre-fail and old-age. I don't remember the details exactly, but it has something to do with the expected lifetime as set by the manufacturer. Since then I don't worry about it anymore. I just make regularly back-ups, so if the disk dies, I still have everything.

                          Comment


                            #14
                            Happy birthday Jerry!

                            Originally posted by dibl View Post
                            In my experience, spontaneous reboot is more often caused by a failing power supply or overheated CPU than the hdd. While you have the laptop opened up, give the CPU heat sink and fan a cleaning and that may help as much as a new hdd.
                            Agreed.

                            And as others have stated: "pre-fail" and "old age" are the normal responses. However, I have several drives with more than 60,000 Power_on_hours with no reallocated sectors so I'm not sure I would state the drive is old. The bad sectors are troublesome but only if they're increasing. If you're seeing an uptick in reallocation, I'd replace it fast.

                            Might I suggest a small SSD instead of an HD? Less power, less heat, less likely to damage it from a drop, and I suspect you don't really need a full 640GB for a linux machine. Besides, you can always keep the SSD when the laptop finally dies and move it to another computer. I prefer Samsung PRO drives but if you want to spend dollars more locally, try a Mushkin drive. They're based in Englewood,CO so at least some of your money stays here.

                            Please Read Me

                            Comment


                              #15
                              I checked out the Samsong Pro 256GB drive on Amazon. A couple thousand reviews with 3% of those giving it a one star. The firmware update accounted for a lot of the negative reviews, but there was too many who stated it just failed to boot without previous warnings. A lot of the failures which occurred before two years had passed. That's for a $155 drive. A cheaper one won't be better. IMO.
                              "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                              – John F. Kennedy, February 26, 1962.

                              Comment

                              Working...
                              X