Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVMe Windows self-test seems a bit off #64

Closed
csm10495 opened this issue Mar 25, 2023 · 10 comments
Closed

NVMe Windows self-test seems a bit off #64

csm10495 opened this issue Mar 25, 2023 · 10 comments
Assignees
Labels

Comments

@csm10495
Copy link

csm10495 commented Mar 25, 2023

Describe the bug
When trying to do a self-test with NVMe on Windows, it seems to act a bit weird. For example:

smartctl can see device self test runs:

smartctl -a /dev/sda -i
...
Self-test Log (NVMe Log 0x06)
Self-test status: Extended self-test in progress (16% completed)
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID SCT Code
 0   Extended          Completed without error                3441            -     -   -    -
 1   Short             Completed without error                3441            -     -   -    -
 2   Short             Completed without error                1361            -     -   -    -
 3   Extended          Completed without error                 866            -     -   -    -
 4   Short             Aborted: Self-test command              865            -     -   -    -
 5   Short             Completed without error                 340            -     -   -    -

pySMART can't see the same

In [15]: d.dev_reference
Out[15]: '/dev/sda'

In [16]: d.update()

In [17]: d.tests
Out[17]: []

In [18]: d.get_selftest_result()
Out[18]: (2, 'No new self-test results found.', None)

In [19]: import subprocess; subprocess.call('smartctl -l selftest /dev/sda -d nvme', shell=True)
smartctl pre-7.4 2023-03-21 r5470 [x86_64-w64-mingw32-w10-22H2] (CircleCI)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Self-test Log (NVMe Log 0x06)
Self-test status: Extended self-test in progress (59% completed)
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID SCT Code
 0   Extended          Completed without error                3441            -     -   -    -
 1   Short             Completed without error                3441            -     -   -    -
 2   Short             Completed without error                1361            -     -   -    -
 3   Extended          Completed without error                 866            -     -   -    -
 4   Short             Aborted: Self-test command              865            -     -   -    -
 5   Short             Completed without error                 340            -     -   -    -

Out[19]: 0

In [20]: d.get_selftest_result()
Out[20]: (2, 'No new self-test results found.', None)

In [21]: d.update()

In [22]: d.tests
Out[22]: []

In [23]: d
Out[23]: <NVME device on /dev/sda mod:Samsung SSD 980 PRO 1TB sn:S5P2NG0NB05964V>

Raw outputs
Please, provide the following raw command outputs to be added to our test-base. Notify explicitly if you don't want them to be incorporated. In any case, they are "mostly required" to check and verify your issue/bug.

C:\WINDOWS\system32>smartctl --scan-open
/dev/sda -d nvme # /dev/sda, NVMe device
/dev/sdb -d nvme # /dev/sdb, NVMe device
/dev/sdc -d scsi # /dev/sdc, SCSI device
C:\WINDOWS\system32>smartctl --all /dev/sda -d nvme
smartctl pre-7.4 2023-03-21 r5470 [x86_64-w64-mingw32-w10-22H2] (CircleCI)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 980 PRO 1TB
Serial Number:                      S5P2NG0NB05964V
Firmware Version:                   5B2QGXA7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            471,137,140,736 [471 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 bb01501f70
Local Time is:                      Sat Mar 25 16:14:42 2023 PDT
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.49W       -        -    0  0  0  0        0       0
 1 +     4.48W       -        -    1  1  1  1        0     200
 2 +     3.18W       -        -    2  2  2  2        0    1000
 3 -   0.0400W       -        -    3  3  3  3     2000    1200
 4 -   0.0050W       -        -    4  4  4  4      500    9500

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        62 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    13%
Data Units Read:                    151,288,508 [77.4 TB]
Data Units Written:                 264,489,347 [135 TB]
Host Read Commands:                 5,047,031,068
Host Write Commands:                7,842,130,574
Controller Busy Time:               19,134
Power Cycles:                       662
Power On Hours:                     3,441
Unsafe Shutdowns:                   36
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               62 Celsius
Temperature Sensor 2:               72 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: Extended self-test in progress (28% completed)
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID SCT Code
 0   Extended          Completed without error                3441            -     -   -    -
 1   Short             Completed without error                3441            -     -   -    -
 2   Short             Completed without error                1361            -     -   -    -
 3   Extended          Completed without error                 866            -     -   -    -
 4   Short             Aborted: Self-test command              865            -     -   -    -
 5   Short             Completed without error                 340            -     -   -    -


Environmental setup:
Note i'm using a nightly build of smartctl since the latest released doesn't work properly with nvme self-tests.

C:\WINDOWS\system32>smartctl --version
smartctl pre-7.4 2023-03-21 r5470 [x86_64-w64-mingw32-w10-22H2] (CircleCI)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

smartctl comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
the terms of the GNU General Public License; either
version 2, or (at your option) any later version.
See https://www.gnu.org for further details.

smartmontools pre-release 7.4
smartmontools SVN rev 5470 dated 2023-03-21 at 19:47:46
smartmontools build host: x86_64-w64-mingw32
smartmontools build with: C++11, GCC 7.3-win32 20180312, MinGW-w64 5.0
smartmontools configure arguments: 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-w64-mingw32'

C:\WINDOWS\system32>pip freeze | findstr -i smart
pySMART==1.2.3

Additional context
Optimally it would be nice to be able to run and query self tests from py-SMART

@ralequi
Copy link
Collaborator

ralequi commented Mar 27, 2023

Hi Charles @csm10495,

I'm going pick your raw data and check what is going wrong.

In any case, to your comment

Optimally it would be nice to be able to run and query self tests from py-SMART

Note that there are methods test_start & test_stop exactly to do what you are requesting on the pysmart object and run_selftest on the device object

@ralequi ralequi added the bug label Mar 27, 2023
@ralequi
Copy link
Collaborator

ralequi commented Mar 27, 2023

Just to take into account:

  • NVME tests (read & write) are only supported on smartctl >= 7.4

ralequi added a commit that referenced this issue Mar 27, 2023
Fixes issue NVMe Windows self-test seems a bit off #64
@ralequi
Copy link
Collaborator

ralequi commented Mar 27, 2023

Hi again @csm10495

Please, check the develop branch and tell me if everything works fine now.

@csm10495
Copy link
Author

Cool. Getting closer: though i noticed printing a test doesn't work:

In [13]: t = d.tests[0]

In [14]: t
Out[14]: <SMART Self-test [Extended|Completed without error] hrs:3442 LBA:None>

In [15]: print(t)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [15], line 1
----> 1 print(t)

File ~\Desktop\py-SMART\pySMART\testentry.py:170, in TestEntry.__str__(self)
    153     return ("{0:>2} {1:17}{2:23}{3:7}{4:14}[{5:4}{6:5}{7:4}]".format(
    154         self.num,
    155         self.type,
   (...)
    161         self.ASCQ
    162     ))
    163 elif self._format == 'nvme':
    164     ## NVME FORMAT ##
    165     # Example smartctl output
   (...)
    168     # Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID SCT Code
    169     #  0   Extended          Completed without error                3441            -     -   -    -
--> 170     return ("{0:>2} {1:18}{2:29}{3:16}{4:13}{5:5}{6:4}{7:4}".format(
    171         self.num,
    172         self.type,
    173         self.status,
    174         self.hours,
    175         self.LBA,
    176         self.nsid,
    177         self.sct,
    178         self.code
    179     ))

TypeError: unsupported format string passed to NoneType.__format__

In [16]: d
Out[16]: <NVME device on /dev/sda mod:Samsung SSD 980 PRO 1TB sn:S5P2NG0NB05964V>

In [17]: d.tests
Out[17]:
[<SMART Self-test [Extended|Completed without error] hrs:3442 LBA:None>,
 <SMART Self-test [Extended|Completed without error] hrs:3441 LBA:None>,
 <SMART Self-test [Short|Completed without error] hrs:3441 LBA:None>,
 <SMART Self-test [Short|Completed without error] hrs:1361 LBA:None>,
 <SMART Self-test [Extended|Completed without error] hrs:866 LBA:None>,
 <SMART Self-test [Short|Aborted: Self-test command] hrs:865 LBA:None>,
 <SMART Self-test [Short|Completed without error] hrs:340 LBA:None>]

I also tried doing a run_selftest_and_wait() and it looks like it returned the last run test immediately instead of waiting for the new test to start. Running it again gave back 'self test already in progress'.:

In [18]: d.run_selftest_and_wait('short')
Out[18]: (0, <SMART Self-test [Extended|Completed without error] hrs:3442 LBA:None>)

In [19]: d.run_selftest_and_wait('short')
Out[19]: (1, 'Self-test already in progress. Please wait.')

When running a long test, i'm not quite sure it did anything, though it should have worked:

In [44]: d.run_selftest_and_wait('long')
Out[44]: (0, <SMART Self-test [Short|Completed without error] hrs:3473 LBA:None>)

In [45]: d.run_selftest_and_wait('long')
Out[45]: (0, <SMART Self-test [Short|Completed without error] hrs:3473 LBA:None>)

In [46]: d.test_capabilities
Out[46]:
{'offline': False,
 'short': True,
 'long': True,
 'conveyance': False,
 'selective': False}

ralequi added a commit that referenced this issue Mar 28, 2023
(issue NVMe Windows self-test seems a bit off #64)
@ralequi
Copy link
Collaborator

ralequi commented Mar 28, 2023

Hi @csm10495

I made a couple of fixes. I think it should work now, or at least everything I've tested does.
Please, check them when you have time.

@csm10495
Copy link
Author

csm10495 commented Mar 29, 2023

Getting closer. Now it seems like it ran a test, but only once. Now if I try to run another, it seems to immediately pass but not add to d.tests which makes me think its not kicking the test again.

In [33]: d.run_selftest_and_wait('short')
Out[33]: (0, <SMART Self-test [Short|Completed without error] hrs:3481 LBA:None>)

In [34]: len(d.tests)
Out[34]: 9

In [35]: d.run_selftest_and_wait('short')
Out[35]: (0, <SMART Self-test [Short|Completed without error] hrs:3481 LBA:None>)

In [36]: len(d.tests)
Out[36]: 9

In [37]: d.run_selftest_and_wait('short')
Out[37]: (0, <SMART Self-test [Short|Completed without error] hrs:3481 LBA:None>)

In [38]: len(d.tests)
Out[38]: 9

In [39]: print(d.tests[0])
 0   Short             Completed without error                3481            -     -   -    -

In [40]: d.update()

In [41]: len(d.tests)
Out[41]: 9

...


In [44]: d.run_selftest_and_wait('long')
Out[44]: (0, <SMART Self-test [Short|Completed without error] hrs:3481 LBA:None>)

In [45]: d.update()

In [46]: len(d.tests)
Out[46]: 9


@ralequi
Copy link
Collaborator

ralequi commented Mar 29, 2023

Hi @csm10495

Locally, when I've tested it, it worked perfectly.
9 may be a limit in how many tests are reported on smartctl.

Could you please check that's correct? I don't see any logic path on the code that may it work only once. You can also test "short" and "long" alternatively so it's easier to view if new tests are appearing.

@csm10495
Copy link
Author

I think smartctl itself is messing something up here. It doesn't seem to want to actually kick the tests, but when i use Samsung Magician to kick the test, it can report the test status just fine. Similarly aborting the test doesn't do anything with smartctl but works fine with Magician. Looks like the tests appear in smartctl's listing after completion/abortion still.

I'm guessing that the issues I'm seeing are directly because smartctl (and not necessarily py-SMART) is acting funky.

Here's a video to show the full behavior and testing:

Video.mp4

@csm10495
Copy link
Author

I think from py-SMART's perspective ya did what you could. I've filed: https://www.smartmontools.org/ticket/1708 with smartmontools. I think this is ready to be closed out.

Thanks!

@ralequi
Copy link
Collaborator

ralequi commented Mar 30, 2023

Thank you Charles!

Don't hesitate to open any new issue you found out there!

@ralequi ralequi closed this as completed Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants