Disque : analyse smart

Tags: #<Tag:0x00007f993eb0c690> #<Tag:0x00007f993eb0c5c8>

Bonjour,

J’ai quelques erreurs dans mes logs et après analyse avec smartctl j’ai quelques doutes sur l’état des disques.
Il y a deux disques qui posent soucis: un disque dur et un disque NVME.
sur le disque dur j’ai tous mes documents et sur le NVME mon /home donc que des données importantes.
S’il y a un doute, je préfère anticiper et changer le disque au plus tôt même si j’ai une sauvegarde.
Sur le disque dur, j’aurai tendance à penser qu’il faut le changer
Pour le NVME, je n’arrive pas à savoir à quoi correspondent les données pour les analyser.

En tout cas, j’ai bien des erreurs dans les logs.

journalctl -p emerg..err -u smartmontools.service
-- Boot 82bfbab7ae7b43b1ba240d50df9bcafb --
mars 11 22:23:01 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors (changed -1)
mars 11 22:23:01 pc343b smartd[1861]: Device: /dev/nvme0, number of Error Log entries increased from 1352 to 1353
mars 11 22:08:36 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
mars 11 22:38:36 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
mars 11 23:08:35 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
mars 11 23:38:36 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
mars 12 00:08:36 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
mars 12 00:38:36 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
mars 12 01:08:35 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
mars 12 01:38:36 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
mars 12 02:08:36 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
mars 12 02:38:36 pc343b smartd[1861]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
-- Boot 341dd5ddc13b455eaa4b5fa8451921e7 --
mars 12 12:07:44 pc343b smartd[1380]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors
mars 12 12:07:44 pc343b smartd[1380]: Device: /dev/nvme0, number of Error Log entries increased from 1353 to 1354
mars 12 11:53:19 pc343b smartd[1380]: Device: /dev/sde [SAT], 7 Currently unreadable (pending) sectors

disque NVME

je ne sais pas comment interprété : ‹ Error Information Log Entries: 1 354 ›
après quelques recherches, ça correspondrait plus ou moins à des erreurs au démarrage du système mais sans incidence sur le disque.
Je n’ai pas trouvé d’autre infos et à quoi sert cette donnée :confused:
Sur le net, on retrouve les entrées à surveiller et qui présage d’une panne pour un disque SATA mais rien sur le NVME.

nvme smart-log /dev/nvme0n1
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning			: 0
temperature				: 36°C (309 Kelvin)
available_spare				: 100%
available_spare_threshold		: 5%
percentage_used				: 4%
endurance group critical warning summary: 0
Data Units Read				: 330���67188 (16,93 TB)
Data Units Written			: 337���73001 (17,29 TB)
host_read_commands			: 6���122���04800
host_write_commands			: 5���434���55866
controller_busy_time			: 1947
power_cycles				: 1509
power_on_hours				: 15477
unsafe_shutdowns			: 68
media_errors				: 0
num_err_log_entries			: 1354
Warning Temperature Time		: 0
Critical Composite Temperature Time	: 0
Thermal Management T1 Trans Count	: 0
Thermal Management T2 Trans Count	: 0
Thermal Management T1 Total Time	: 0
Thermal Management T2 Total Time	: 0




smartctl -a /dev/nvme0n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-18-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Force MP510
Serial Number:                      192382060001277003FC
Firmware Version:                   ECFM12.2
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Total NVM Capacity:                 480 103 981 056 [480 GB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          480 103 981 056 [480 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            6479a7 2071104643
Local Time is:                      Tue Mar 12 12:08:39 2024 CET
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0054):     DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0c):         Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     70 Celsius
Critical Comp. Temp. Threshold:     90 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.12W       -        -    0  0  0  0        0       0
 1 +     6.40W       -        -    1  1  1  1        0       0
 2 +     5.54W       -        -    2  2  2  2        0       0
 3 -   0.0490W       -        -    3  3  3  3     2000    2000
 4 -   0.0018W       -        -    4  4  4  4    25000   25000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        35 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    4%
Data Units Read:                    33 067 214 [16,9 TB]
Data Units Written:                 33 773 180 [17,2 TB]
Host Read Commands:                 612 206 111
Host Write Commands:                543 460 438
Controller Busy Time:               1 947
Power Cycles:                       1 509
Power On Hours:                     15 477
Unsafe Shutdowns:                   68
Media and Data Integrity Errors:    0
Error Information Log Entries:      1 354
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 63 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0       1354     0  0x5003  0x4004  0x028            0     0     -
  1       1353     0  0x401d  0x4004  0x028            0     0     -
  2       1352     0  0x0010  0x4004  0x028            0     0     -
  3       1351     0  0x501b  0x4004  0x028            0     0     -
  4       1350     0  0x0018  0x4004  0x028            0     0     -
  5       1349     0  0x0005  0x4004  0x028            0     0     -
  6       1348     0  0x0010  0x4004  0x028            0     0     -
  7       1347     0  0x0018  0x4004  0x028            0     0     -
  8       1346     0  0x7006  0x4004  0x028            0     0     -
  9       1345     0  0x5010  0x4004  0x028            0     0     -
 10       1344     0  0xb01d  0x4004  0x028            0     0     -
 11       1343     0  0x0010  0x4004  0x028            0     0     -
 12       1342     0  0x001c  0x4004  0x028            0     0     -
 13       1341     0  0x0018  0x4004  0x028            0     0     -
 14       1340     0  0x0000  0x4004  0x028            0     0     -
 15       1339     0  0x000c  0x4004  0x028            0     0     -
... (47 entries not read)


nvme error-log /dev/nvme0n1
Error Log Entries for device:nvme0n1 entries:63
.................
 Entry[ 0]   
.................
error_count	: 1354
sqid		: 0
cmdid		: 0x5003
status_field	: 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag	: 0
parm_err_loc	: 0x28
lba		: 0
nsid		: 0
vs		: 0
trtype		: The transport type is not indicated or the error is not transport related.
cs		: 0
trtype_spec_info: 0
.................
 Entry[ 1]   
.................
error_count	: 1353
sqid		: 0
cmdid		: 0x401d
status_field	: 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag	: 0
parm_err_loc	: 0x28
lba		: 0
nsid		: 0
vs		: 0
trtype		: The transport type is not indicated or the error is not transport related.
cs		: 0
trtype_spec_info: 0
.................
 Entry[ 2]   
.................
error_count	: 1352
sqid		: 0
cmdid		: 0x10
status_field	: 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag	: 0
parm_err_loc	: 0x28
lba		: 0
nsid		: 0
vs		: 0
trtype		: The transport type is not indicated or the error is not transport related.
cs		: 0
trtype_spec_info: 0
.................
 Entry[ 3]   
.................
error_count	: 1351
sqid		: 0
cmdid		: 0x501b
status_field	: 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag	: 0
parm_err_loc	: 0x28
lba		: 0
nsid		: 0
vs		: 0
trtype		: The transport type is not indicated or the error is not transport related.
cs		: 0
trtype_spec_info: 0
.................
 Entry[ 4]   
.................
error_count	: 1350
sqid		: 0
cmdid		: 0x18
status_field	: 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag	: 0
parm_err_loc	: 0x28
lba		: 0
nsid		: 0
vs		: 0
trtype		: The transport type is not indicated or the error is not transport related.
cs		: 0
trtype_spec_info: 0
.................

disque dur

Gnome disk m’indique : ‹ Le disque est sain, 7 secteurs endommagés (27 ℃ / 81 ℉) ›
Est-ce que c’est le moment de le remplacer sachant que tous mes documents sont stockés dessus ?

smartctl -a /dev/sde
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-18-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green
Device Model:     WDC WD10EACS-00D6B0
Serial Number:    WD-WCAU40228427
LU WWN Device Id: 5 0014ee 2017df4ba
Firmware Version: 01.01A01
User Capacity:    1 000 204 886 016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database 7.3/5319
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.5, 3.0 Gb/s
Local Time is:    Tue Mar 12 11:47:48 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85)	Offline data collection activity
					was aborted by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(22200) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x303f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   142   140   021    Pre-fail  Always       -       7858
  4 Start_Stop_Count        0x0032   093   093   000    Old_age   Always       -       7256
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   051    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   021   021   000    Old_age   Always       -       57878
 10 Spin_Retry_Count        0x0032   100   100   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   094   094   000    Old_age   Always       -       6052
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       224
193 Load_Cycle_Count        0x0032   198   198   000    Old_age   Always       -       6884
194 Temperature_Celsius     0x0022   120   101   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   199   199   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       7
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     57878         -
# 2  Extended offline    Interrupted (host reset)      60%      9008         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.