Announcement

Collapse
No announcement yet.

HDD failure warning and monitoring

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    [PLASMA 5] HDD failure warning and monitoring

    I saw about Plasma 5.20 features in KDE blog. One impressive feature in 5.20 is that it is capable of warning if your hard disk is going to die. Also, the Info Center app allows you to check the status through it. It is a great feature to have. I am using 5.18 and hope to find a way to get such a warning and status info in my 20.04. Is there any method to do it ? Thanks.

    #2
    I use this app:
    https://www.hdsentinel.com/hard_disk..._linux_gui.php

    I used to use S.M.A.R.T. to obtain the total terabytes written and the total uptime and then calculate the days remaining on my two SSD's. It also monitors my spinning disk, which has 16 bad sectors.
    This app computes health and days remaining automatically. You can minimize it in your system tray but I didn't do that.
    I run it about once a week or so, IF I think about it.
    "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
    – John F. Kennedy, February 26, 1962.

    Comment


      #3
      Thanks for the link, GreyGeek. I've downloaded it for both my computers (64-bit and 32-bit) and it works fine. And the hard disks have over three more years of life.

      Comment


        #4
        Thanks. I will try

        Comment


          #5
          Originally posted by GreyGeek View Post
          I use this app:
          https://www.hdsentinel.com/hard_disk..._linux_gui.php

          I used to use S.M.A.R.T. to obtain the total terabytes written and the total uptime and then calculate the days remaining on my two SSD's. It also monitors my spinning disk, which has 16 bad sectors.
          This app computes health and days remaining automatically. You can minimize it in your system tray but I didn't do that.
          I run it about once a week or so, IF I think about it.
          Thanks! Works great! Been looking for something like this!

          Comment


            #6
            Interesting application. Not sure how useful - gives my Samsung 840 Pro 256GB SSDs 84 and 93 days to 'live', both due to "#177 Wear Leveling Count". However, not all drive makers report their wear leveling count in the same way. Some start at 0 and go to 100, some in reverse. So either these drives are REALLY close to dying or the report is wrong. We'll see. They report 84/88 as those values.

            BTW, if you want to launch this app kmenu, use the full path in the "Command:" entry AND set the "Work path:" in Advanced. It doesn't appear to accept the sudo password if you try and launch it from outside its custom directory.

            Another useful tool, and often sited as more accurate than SMARTCTL, is SKDUMP from the libatasmart-bin package. To compare output:

            smartctl:
            Code:
            [FONT=monospace][COLOR=#54FF54][B]stuart@office[/B][/COLOR][COLOR=#000000]:[/COLOR][COLOR=#54FFFF][B]~[/B][/COLOR][COLOR=#000000]$ sudo smartctl -A  /dev/sdc[/COLOR]
            smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.3.0-62-generic] (local build)
            Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
            
            === START OF READ SMART DATA SECTION ===
            SMART Attributes Data Structure revision number: 1
            Vendor Specific SMART Attributes with Thresholds:
            ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
              5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
              9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       48262
             12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       727
            177 Wear_Leveling_Count     0x0013   084   084   000    Pre-fail  Always       -       572
            179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
            181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
            182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
            183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
            187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
            190 Airflow_Temperature_Cel 0x0032   066   044   000    Old_age   Always       -       34
            195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
            199 CRC_Error_Count         0x003e   099   099   000    Old_age   Always       -       15
            235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       523
            241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       38488776948
            
            [/FONT]
            skdump:
            Code:
            [FONT=monospace][COLOR=#54FF54][B]stuart@office[/B][/COLOR][COLOR=#000000]:[/COLOR][COLOR=#54FFFF][B]~[/B][/COLOR][COLOR=#000000]$ sudo skdump /dev/sdc[/COLOR]
            Device: sat16:/dev/sdc
            Type: 16 Byte SCSI ATA SAT Passthru
            Size: 244198 MiB
            Model: [Samsung SSD 840 PRO Series]
            Serial: [S12RNEACC52824T]
            Firmware: [DXM05B0Q]
            SMART Available: yes
            Quirks:
            Awake: yes
            SMART Disk Health Good: yes
            Off-line Data Collection Status: [Off-line data collection activity was never started.]
            Total Time To Complete Off-Line Data Collection: 53956 s
            Self-Test Execution Status: [The previous self-test routine completed without error or no self-test has ever been run.]
            Percent Self-Test Remaining: 0%
            Conveyance Self-Test Available: no
            Short/Extended Self-Test Available: yes
            Start Self-Test Available: yes
            Abort Self-Test Available: yes
            Short Self-Test Polling Time: 2 min
            Extended Self-Test Polling Time: 20 min
            Conveyance Self-Test Polling Time: 0 min
            Bad Sectors: 0 sectors
            Powered On: 5.5 years
            Power Cycles: 727
            Average Powered On Per Power Cycle: 2.8 days
            Temperature: 34.0 C
            Attribute Parsing Verification: Good
            Overall Status: GOOD
            ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
              5 reallocated-sector-count    100   100    10   0 sectors   0x000000000000 prefail online  yes  yes  
              9 power-on-hours               90    90     0   5.5 years   0x86bc00000000 old-age online  n/a  n/a  
             12 power-cycle-count            99    99     0   727         0xd70200000000 old-age online  n/a  n/a  
            177 wear-leveling-count          84    84     0   572         0x3c0200000000 prefail online  n/a  n/a  
            179 used-reserved-blocks-total  100   100    10   0           0x000000000000 prefail online  yes  yes  
            181 program-fail-count-total    100   100    10   0           0x000000000000 old-age online  yes  yes  
            182 erase-fail-count-total      100   100    10   0           0x000000000000 old-age online  yes  yes  
            183 runtime-bad-block-total     100   100    10   0           0x000000000000 prefail online  yes  yes  
            187 reported-uncorrect          100   100     0   0 sectors   0x000000000000 old-age online  n/a  n/a  
            190 airflow-temperature-celsius  66    44     0   34.0 C      0x220000000000 old-age online  n/a  n/a  
            195 hardware-ecc-recovered      200   200     0   0           0x000000000000 old-age online  n/a  n/a  
            199 udma-crc-error-count         99    99     0   15          0x0f0000000000 old-age online  n/a  n/a  
            235 good-block-rate              99    99     0   n/a         0x0b0200000000 old-age online  n/a  n/a  
            241 total-lbas-written           99    99     0   1291469.049 TB 0xf4201cf60800 old-age online  n/a  n/a 
            [/FONT]

            Please Read Me

            Comment


              #7
              I use GSmartcontrol. Basically GUI in front of smartctl. I don't use it a lot, maybe twice a year.

              Disk failure prediction is a black art at best. It's completely probablistic; a disk might be fine today, and refuse to show itself on a reboot.
              The next brick house on the left
              Intel i7 11th Gen | 16GB | 1TB | KDE Plasma 5.24.7 | Kubuntu 22.04.4 | 6.5.0-28-generic


              Comment


                #8
                Originally posted by jglen490 View Post
                ...probablistic...
                Another high scoring word


                ... and in case anyone wants to look at the suggested application: https://gsmartcontrol.sourceforge.io/home/

                Please Read Me

                Comment


                  #9
                  It's also in the repos
                  The next brick house on the left
                  Intel i7 11th Gen | 16GB | 1TB | KDE Plasma 5.24.7 | Kubuntu 22.04.4 | 6.5.0-28-generic


                  Comment


                    #10
                    Doesnt work on nvme drives.

                    Comment


                      #11
                      Not having an nvme, I would ask what does work on an nvme?
                      The next brick house on the left
                      Intel i7 11th Gen | 16GB | 1TB | KDE Plasma 5.24.7 | Kubuntu 22.04.4 | 6.5.0-28-generic


                      Comment


                        #12
                        Originally posted by jglen490 View Post
                        Not having an nvme, I would ask what does work on an nvme?
                        smartctl and hdsentinel both read info from nvme drives as well as the nvme application itself from nvme-cli package.

                        Please Read Me

                        Comment


                          #13
                          Awesome!
                          The next brick house on the left
                          Intel i7 11th Gen | 16GB | 1TB | KDE Plasma 5.24.7 | Kubuntu 22.04.4 | 6.5.0-28-generic


                          Comment


                            #14
                            It is interesting how the info differs...

                            SATA SSD:
                            Code:
                            [FONT=monospace][COLOR=#54FF54][B]stuart@office[/B][/COLOR][COLOR=#000000]:[/COLOR][COLOR=#54FFFF][B]~[/B][/COLOR][COLOR=#000000]$ sudo smartctl -A /dev/sdc[/COLOR]
                            smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-45-generic] (local build)
                            Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
                            
                            === START OF READ SMART DATA SECTION ===
                            SMART Attributes Data Structure revision number: 1
                            Vendor Specific SMART Attributes with Thresholds:
                            ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
                              5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
                              9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       48355
                             12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       728
                            177 Wear_Leveling_Count     0x0013   084   084   000    Pre-fail  Always       -       572
                            179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
                            181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
                            182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
                            183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
                            187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
                            190 Airflow_Temperature_Cel 0x0032   067   044   000    Old_age   Always       -       33
                            195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
                            199 CRC_Error_Count         0x003e   099   099   000    Old_age   Always       -       15
                            235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       524
                            241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       38488777260
                            [/FONT]
                            NVME SSD:
                            Code:
                            [FONT=monospace][COLOR=#54FF54][B]stuart@office[/B][/COLOR][COLOR=#000000]:[/COLOR][COLOR=#54FFFF][B]~[/B][/COLOR][COLOR=#000000]$ sudo smartctl -A /dev/nvme0n1[/COLOR]
                            smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-45-generic] (local build)
                            Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
                            
                            === START OF SMART DATA SECTION ===
                            SMART/Health Information (NVMe Log 0x02)
                            Critical Warning:                   0x00
                            Temperature:                        32 Celsius
                            Available Spare:                    100%
                            Available Spare Threshold:          10%
                            Percentage Used:                    1%
                            Data Units Read:                    23,678,927 [12.1 TB]
                            Data Units Written:                 54,044,548 [27.6 TB]
                            Host Read Commands:                 125,235,307
                            Host Write Commands:                593,495,603
                            Controller Busy Time:               2,326
                            Power Cycles:                       116
                            Power On Hours:                     4,236
                            Unsafe Shutdowns:                   66
                            Media and Data Integrity Errors:    0
                            Error Information Log Entries:      31
                            Warning  Comp. Temperature Time:    0
                            Critical Comp. Temperature Time:    0
                            Temperature Sensor 1:               32 Celsius
                            Temperature Sensor 2:               33 Celsius
                            [/FONT]
                            The nvme program gives just slightly more:
                            Code:
                            [FONT=monospace][COLOR=#54FF54][B]stuart@office[/B][/COLOR][COLOR=#000000]:[/COLOR][COLOR=#54FFFF][B]~[/B][/COLOR][COLOR=#000000]$ sudo nvme smart-log /dev/nvme0n1[/COLOR]
                            Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
                            critical_warning                    : 0
                            temperature                         : 32 C
                            available_spare                     : 100%
                            available_spare_threshold           : 10%
                            percentage_used                     : 1%
                            data_units_read                     : 23,678,927
                            data_units_written                  : 54,044,632
                            host_read_commands                  : 125,235,316
                            host_write_commands                 : 593,496,256
                            controller_busy_time                : 2,326
                            power_cycles                        : 116
                            power_on_hours                      : 4,236
                            unsafe_shutdowns                    : 66
                            media_errors                        : 0
                            num_err_log_entries                 : 31
                            Warning Temperature Time            : 0
                            Critical Composite Temperature Time : 0
                            Temperature Sensor 1                : 32 C
                            Temperature Sensor 2                : 33 C
                            Thermal Management T1 Trans Count   : 0
                            Thermal Management T2 Trans Count   : 0
                            Thermal Management T1 Total Time    : 0
                            Thermal Management T2 Total Time    : 0
                            [/FONT]
                            Last edited by oshunluvr; Sep 03, 2020, 05:34 AM.

                            Please Read Me

                            Comment


                              #15
                              The nvme program leaves out TB written! Don't nvme drives have a limit on total TB written that relates directly to the aging of the drive?
                              "A nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people.”
                              – John F. Kennedy, February 26, 1962.

                              Comment

                              Working...
                              X