[archlinux][hardware] ThankPad T450自带SSD做bcache之后的使用寿命分析

时间:2021-11-16 08:35:20

这个分析的起因,是由于我之前干了这两个事:

[troubleshoot][archlinux][bcache] 修改linux文件系统 / 分区方案 / 做混合硬盘 / 系统转生大!手!术!(调整底层架构,不!重!装!)

[archlinux][hardware] 查看SSD的使用寿命

在12月06日完成了底层硬盘的调整之后,做了如下的硬盘指标统计:

/home/tong/Workspace/system/bcache [tong@T7] [:]
> cat
smartctl 6.5 -- r4318 [x86_64-linux-4.8.--ARCH] (local build)
Copyright (C) -, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION ===
Model Family: SanDisk based SSDs
Device Model: SanDisk SSD U110 16GB
Serial Number:
LU WWN Device Id: 001b44 ec81598d5
Firmware Version: U21B001
User Capacity: ,,, bytes [16.0 GB]
Sector Size: bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 1.8 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS- T13/-D revision
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Dec :: CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Disabled
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable === START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( ) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( ) seconds.
Offline data collection
capabilities: (0x51) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( ) minutes.
Extended self-test routine
recommended polling time: ( ) minutes. SMART Attributes Data Structure revision number:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
Reallocated_Sector_Ct -O---- -
Power_On_Hours -O---- -
Power_Cycle_Count -O---- -
Program_Fail_Count -O---- -
Erase_Fail_Count -O---- -
Avg_Write/Erase_Count -O---- -
Unexpect_Power_Loss_Ct -O---- -
Reported_Uncorrect -O---- -
Perc_Write/Erase_Count -O---- -
Perc_Avail_Resrvd_Space PO---- -
Perc_Write/Erase_Ct_BC -O---- -
Total_LBAs_Written -O---- -
Total_LBAs_Read -O---- -
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning General Purpose Log Directory Version
SMART Log Directory Version [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O Log Directory
0x01 GPL,SL R/O Summary SMART error log
0x03 GPL,SL R/O Ext. Comprehensive SMART error log
0x04 GPL,SL R/O Device Statistics log
0x06 GPL,SL R/O SMART self-test log
0x09 GPL,SL R/W Selective self-test log
0x10 GPL,SL R/O SATA NCQ Queued Error log
0x11 GPL,SL R/O SATA Phy Event Counters log
0x30 GPL,SL R/O IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W Host vendor specific log
0xa1 GPL,SL VS Device vendor specific log
0xa2 GPL,SL VS Device vendor specific log
0xa3 GPL,SL VS Device vendor specific log
0xa6-0xa7 GPL,SL VS Device vendor specific log Warning! SMART Extended Comprehensive Error Log Structure error: invalid SMART checksum.
SMART Extended Comprehensive Error Log Version: ( sectors)
No Errors Logged SMART Extended Self-test Log (GP Log 0x07) not supported SMART Self-test log structure revision number
No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number
Note: revision number not implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
Not_testing
Not_testing
Not_testing
Not_testing
Not_testing
Read_scanning was never started
Selective self-test flags (0xffff):
Currently read-scanning the remainder of the disk.
If Selective self-test is pending on power-up, resume after minute delay. SCT Commands not supported Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x05 ===== = = === == Temperature Statistics (rev ) ==
0x05 0x008 --- Current Temperature
0x05 0x010 - --- Average Short Term Temperature
0x05 0x018 - --- Average Long Term Temperature
0x05 0x020 --- Highest Temperature
0x05 0x028 --- Lowest Temperature
0x05 0x030 --- Highest Average Short Term Temperature
0x05 0x038 --- Lowest Average Short Term Temperature
0x05 0x040 - --- Highest Average Long Term Temperature
0x05 0x048 - --- Lowest Average Long Term Temperature
0x05 0x050 --- Time in Over-Temperature
0x05 0x058 --- Specified Maximum Operating Temperature
0x05 0x060 --- Time in Under-Temperature
0x05 0x068 --- Specified Minimum Operating Temperature
0x07 ===== = = === == Solid State Device Statistics (rev ) ==
0x07 0x008 N-- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0003 R_ERR response for device-to-host data FIS
0x0004 R_ERR response for host-to-device data FIS
0x0006 R_ERR response for device-to-host non-data FIS
0x0007 R_ERR response for host-to-device non-data FIS
0x0009 Transition from drive PhyRdy to drive PhyNRdy
0x000a Device-to-host register FISes sent due to a COMRESET
0x000f R_ERR response for host-to-device data FIS, CRC
0x0012 R_ERR response for host-to-device non-data FIS, CRC
0x0001 Command failed due to ICRC error /home/tong/Workspace/system/bcache [tong@T7] [:]
>

12月06日硬盘指标统计

在12月19日再次进行了硬盘指标的统计:

/home/tong/Workspace/system/bcache [tong@T7] [:]
> cat
smartctl 6.5 -- r4318 [x86_64-linux-4.8.--ARCH] (local build)
Copyright (C) -, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION ===
Model Family: SanDisk based SSDs
Device Model: SanDisk SSD U110 16GB
Serial Number:
LU WWN Device Id: 001b44 ec81598d5
Firmware Version: U21B001
User Capacity: ,,, bytes [16.0 GB]
Sector Size: bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 1.8 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS- T13/-D revision
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Dec :: CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Disabled
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable === START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( ) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( ) seconds.
Offline data collection
capabilities: (0x51) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( ) minutes.
Extended self-test routine
recommended polling time: ( ) minutes. SMART Attributes Data Structure revision number:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
Reallocated_Sector_Ct -O---- -
Power_On_Hours -O---- -
Power_Cycle_Count -O---- -
Program_Fail_Count -O---- -
Erase_Fail_Count -O---- -
Avg_Write/Erase_Count -O---- -
Unexpect_Power_Loss_Ct -O---- -
Reported_Uncorrect -O---- -
Perc_Write/Erase_Count -O---- -
Perc_Avail_Resrvd_Space PO---- -
Perc_Write/Erase_Ct_BC -O---- -
Total_LBAs_Written -O---- -
Total_LBAs_Read -O---- -
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning General Purpose Log Directory Version
SMART Log Directory Version [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O Log Directory
0x01 GPL,SL R/O Summary SMART error log
0x03 GPL,SL R/O Ext. Comprehensive SMART error log
0x04 GPL,SL R/O Device Statistics log
0x06 GPL,SL R/O SMART self-test log
0x09 GPL,SL R/W Selective self-test log
0x10 GPL,SL R/O SATA NCQ Queued Error log
0x11 GPL,SL R/O SATA Phy Event Counters log
0x30 GPL,SL R/O IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W Host vendor specific log
0xa1 GPL,SL VS Device vendor specific log
0xa2 GPL,SL VS Device vendor specific log
0xa3 GPL,SL VS Device vendor specific log
0xa6-0xa7 GPL,SL VS Device vendor specific log Warning! SMART Extended Comprehensive Error Log Structure error: invalid SMART checksum.
SMART Extended Comprehensive Error Log Version: ( sectors)
No Errors Logged SMART Extended Self-test Log (GP Log 0x07) not supported SMART Self-test log structure revision number
No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number
Note: revision number not implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
Not_testing
Not_testing
Not_testing
Not_testing
Not_testing
Read_scanning was never started
Selective self-test flags (0xffff):
Currently read-scanning the remainder of the disk.
If Selective self-test is pending on power-up, resume after minute delay. SCT Commands not supported Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x05 ===== = = === == Temperature Statistics (rev ) ==
0x05 0x008 --- Current Temperature
0x05 0x010 - --- Average Short Term Temperature
0x05 0x018 - --- Average Long Term Temperature
0x05 0x020 --- Highest Temperature
0x05 0x028 --- Lowest Temperature
0x05 0x030 --- Highest Average Short Term Temperature
0x05 0x038 --- Lowest Average Short Term Temperature
0x05 0x040 - --- Highest Average Long Term Temperature
0x05 0x048 - --- Lowest Average Long Term Temperature
0x05 0x050 --- Time in Over-Temperature
0x05 0x058 --- Specified Maximum Operating Temperature
0x05 0x060 --- Time in Under-Temperature
0x05 0x068 --- Specified Minimum Operating Temperature
0x07 ===== = = === == Solid State Device Statistics (rev ) ==
0x07 0x008 N-- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0003 R_ERR response for device-to-host data FIS
0x0004 R_ERR response for host-to-device data FIS
0x0006 R_ERR response for device-to-host non-data FIS
0x0007 R_ERR response for host-to-device non-data FIS
0x0009 Transition from drive PhyRdy to drive PhyNRdy
0x000a Device-to-host register FISes sent due to a COMRESET
0x000f R_ERR response for host-to-device data FIS, CRC
0x0012 R_ERR response for host-to-device non-data FIS, CRC
0x0001 Command failed due to ICRC error /home/tong/Workspace/system/bcache [tong@T7] [:]
>

12月19日硬盘指标统计

比较如下:

/home/tong/Workspace/system/bcache [tong@T7] [:]
> diff
1c1
< smartctl 6.5 -- r4318 [x86_64-linux-4.8.--ARCH] (local build)
---
> smartctl 6.5 -- r4318 [x86_64-linux-4.8.--ARCH] (local build)
17c17
< Local Time is: Tue Dec :: CST
---
> Local Time is: Mon Dec :: CST
,63c62,
< Power_On_Hours -O---- -
< Power_Cycle_Count -O---- -
---
> Power_On_Hours -O---- -
> Power_Cycle_Count -O---- -
66c66
< Avg_Write/Erase_Count -O---- -
---
> Avg_Write/Erase_Count -O---- -
69c69
< Perc_Write/Erase_Count -O---- -
---
> Perc_Write/Erase_Count -O---- -
,73c71,
< Perc_Write/Erase_Ct_BC -O---- -
< 241 Total_LBAs_Written -O---- 100 100 000 - 538537024
< 242 Total_LBAs_Read -O---- 100 100 000 - 1275507679
---
> Perc_Write/Erase_Ct_BC -O---- -
> 241 Total_LBAs_Written -O---- 100 100 000 - 598719455
> 242 Total_LBAs_Read -O---- 100 100 000 - 1338182982
126c126
< 0x05 0x008 --- Current Temperature
---
> 0x05 0x008 --- Current Temperature
140c140
< 0x07 0x008 1 1 N-- Percentage Used Endurance Indicator
---
> 0x07 0x008 1 2 N-- Percentage Used Endurance Indicator /home/tong/Workspace/system/bcache [tong@T7] [:]
>

关键指标红色标出。

从去年大概10月底新机初装开始,到12月06日,寿命禁用了1%。从12月06至19日短短13天,寿命值便增长为2%。

已知假设常规使用的情况下一块SSD的寿命是10年,计算如下:

/home/tong/Workspace/system/bcache [tong@T7] [:]
> bc -l
bc 1.06.
Copyright -, , , , , Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
/ 365                  # 取整之前常规使用情况下,的使用时长为365天。
1475443.90136986301369863013          # 计算平均每天的写入次数。
( - ) /
4629417.76923076923076923076          # 计算过去的13天里,评价每天的写入次数。
/ 4                       # 两种情况下的单天写入次数取比例。
2.5000000000000000000

基于以上计算,采用bcache模式使用SSD硬盘的情况下,硬盘的写入次数是常规情况下的4倍。

按照常规经验值10年计算,bcache方式下SSD的寿命为10年的四分之一:2.5年。

我只为bcache分配了16GSSD的一半8GB。又因为bcache方式下,SSD的缓存为 LRU方式。所以,如果采用16GB的话,缓存内容会加倍,读写次数自然也会加倍。这样的话寿命将缩短为1.25年,这与百合的经验也完全相符,它的SSD就是一年之后坏掉的。

所以基于以上,我的这块SSD,将于2019年6月前坏掉。

淘宝上16G二手ngffSSD的售价是35。全新64G也不过150。

我决定将这块SSD作为耗材继续使用。完。:)

当然,后续我还会定期观察。随时更新信息。

====================     update @ 20170624     ========================

smartctl 6.5 -- r4318 [x86_64-linux-4.11.--ARCH] (local build)
Copyright (C) -, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION ===
Model Family: SanDisk based SSDs
Device Model: SanDisk SSD U110 16GB
Serial Number:
LU WWN Device Id: 001b44 ec81598d5
Firmware Version: U21B001
User Capacity: ,,, bytes [16.0 GB]
Sector Size: bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 1.8 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS- T13/-D revision
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Jun :: CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Disabled
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable === START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( ) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( ) seconds.
Offline data collection
capabilities: (0x51) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( ) minutes.
Extended self-test routine
recommended polling time: ( ) minutes. SMART Attributes Data Structure revision number:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
Reallocated_Sector_Ct -O---- -
Power_On_Hours -O---- -
Power_Cycle_Count -O---- -
Program_Fail_Count -O---- -
Erase_Fail_Count -O---- -
Avg_Write/Erase_Count -O---- -
Unexpect_Power_Loss_Ct -O---- -
Reported_Uncorrect -O---- -
Perc_Write/Erase_Count -O---- -
Perc_Avail_Resrvd_Space PO---- -
Perc_Write/Erase_Ct_BC -O---- -
Total_LBAs_Written -O---- -
Total_LBAs_Read -O---- -
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning General Purpose Log Directory Version
SMART Log Directory Version [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O Log Directory
0x01 GPL,SL R/O Summary SMART error log
0x03 GPL,SL R/O Ext. Comprehensive SMART error log
0x04 GPL,SL R/O Device Statistics log
0x06 GPL,SL R/O SMART self-test log
0x09 GPL,SL R/W Selective self-test log
0x10 GPL,SL R/O SATA NCQ Queued Error log
0x11 GPL,SL R/O SATA Phy Event Counters log
0x30 GPL,SL R/O IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W Host vendor specific log
0xa1 GPL,SL VS Device vendor specific log
0xa2 GPL,SL VS Device vendor specific log
0xa3 GPL,SL VS Device vendor specific log
0xa6-0xa7 GPL,SL VS Device vendor specific log Warning! SMART Extended Comprehensive Error Log Structure error: invalid SMART checksum.
SMART Extended Comprehensive Error Log Version: ( sectors)
No Errors Logged SMART Extended Self-test Log (GP Log 0x07) not supported SMART Self-test log structure revision number
No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number
Note: revision number not implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
Not_testing
Not_testing
Not_testing
Not_testing
Not_testing
Read_scanning was never started
Selective self-test flags (0xffff):
Currently read-scanning the remainder of the disk.
If Selective self-test is pending on power-up, resume after minute delay. SCT Commands not supported Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x05 ===== = = === == Temperature Statistics (rev ) ==
0x05 0x008 --- Current Temperature
0x05 0x010 - --- Average Short Term Temperature
0x05 0x018 - --- Average Long Term Temperature
0x05 0x020 --- Highest Temperature
0x05 0x028 --- Lowest Temperature
0x05 0x030 --- Highest Average Short Term Temperature
0x05 0x038 --- Lowest Average Short Term Temperature
0x05 0x040 - --- Highest Average Long Term Temperature
0x05 0x048 - --- Lowest Average Long Term Temperature
0x05 0x050 --- Time in Over-Temperature
0x05 0x058 --- Specified Maximum Operating Temperature
0x05 0x060 --- Time in Under-Temperature
0x05 0x068 --- Specified Minimum Operating Temperature
0x07 ===== = = === == Solid State Device Statistics (rev ) ==
0x07 0x008 N-- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0003 R_ERR response for device-to-host data FIS
0x0004 R_ERR response for host-to-device data FIS
0x0006 R_ERR response for device-to-host non-data FIS
0x0007 R_ERR response for host-to-device non-data FIS
0x0009 Transition from drive PhyRdy to drive PhyNRdy
0x000a Device-to-host register FISes sent due to a COMRESET
0x000f R_ERR response for host-to-device data FIS, CRC
0x0012 R_ERR response for host-to-device non-data FIS, CRC
0x0001 Command failed due to ICRC error

2017年6月24日硬盘指标统计

当前的寿命为23%, 与之前的估算出入不大,寿命大概会在19年春天达到100%。