Intelligent NVMe SSD Testing

By: Bill Andrews | June 18, 2020 | SANBlaze


SANBlaze will alert you when something needs your attention


Alerts, Warnings, Skipped and Paused Tests


One of the challenges of NVMe SSD testing today is having real time status of how the testing is progressing. SANBlaze has solved this issue by providing notification capability built into its interface through the Test Error Control. This simplifies testing as the system will provide real time status versus having to log into each test system and spend time manually checking each test for failures.


While this works for both SCSI and NVMe devices, the focus of the examples is for NVMe.


Here is a list of events that a user can take action on:

Path Lost

Path Failed

Residual != 0

SCSI Status (SCSI only)

NVMe Status (NVMe only)

Data Miscompare

I/O Timeout


As you add error actions, the web page builds and displays a Test Control string in the following format: Type, action, status ,key, asc, ascq

Type – 1 (Path failed); 2 (Residual mismatch); 3 (SCSI status); 4 (Data miscompare); 5 (NVMe status); 6 (I/O timeout); 7 (Path lost)

Action – 0 (ignore); 1 (log); 2 (retry); 3 (fail); 4 (alert)

Status – for SCSI or NVMe status only, SCSI/NVMe hex status value

Key – for SCSI status only, SCSI hex sense key value ASC/ASCQ – for SCSI status only, SCSI hex ASC/ASCQ value Change – apply the change.


This script interface allows a SANBlaze user to tune the type of notification they wish to receive.

An Alert script can be enabled to run upon detection of any test failure. The Default Alert script is template.sh. To set up alerts, you must enable the Alert feature as follows:


Set Action to "Alert" and "Add"


1. ssh into the system as root/SANBlaze

2. cp /virtualun/lundata/initscriptfiles/template.sh /virtualun/lundata/initscriptfiles/template_custom.sh

3. vi /virtualun/lundata/initscriptfiles/template_custom.sh and make modifications

4. Exit from the SSH session

5. Go back to the GUI and refresh the Init NVMe Test Tab

6. Then select the Type and select one of the options


7. Next select Action and then select Alert



8. Next select Add to add this Error Control type



9. Select your modified alert script and click the Select button.



10. Create a new test from the New Disk Test menu.

11. The alert script will be run on a test failure.


Alert on a Data Miscompare


A new test error control type of Data Miscompare has also been added. Selecting Action Alert and a custom Alert Script will cause a miscompare to trigger and run the alert script. This is useful in "lights out" test scenarios to send email or other notification to the user.


Note: The Data Miscompare error control is always considered fatal. Testing will stop on a Miscompare.


The Advantages of a Skipped Test

Did you know that SANBlaze will show a test as “Skipped” if a device does not support a feature in accordance with the NVMe specification? If a device doesn’t support a feature in the NVMe spec, then the test is skipped rather than reported as a false positive or negative failure which saves you time in validating your test results.

Click the Show Tests button from the SBExpress Manager page to assign tests.

Once you have assigned tests to Namespaces using Test Manager from the left-hand menu, the SBExpress Manager page will display the selected tests in the middle section of the page:



Figure 1: SBExpress Manager Tests


Note that the graphing area does not populate until IO testing begins.


Use the Start button to start all selected tests. The tests are run in sequence starting from the lowest selected number in the Seq (Sequence) column.


To run one single test alone, uncheck the All box, select the test and click Start to start it.


While testing is in progress, the page will update automatically, with a refresh rate of 5 seconds. Tests are marked as follows:

  • Green – Passed

  • Yellow – Running

  • Salmon – Paused

  • Red – Failed

  • Blue – Stopped

  • Orange – Warning

  • Purple – Skipped (e.g., the controller doesn’t support the test command or feature)

Below is an example of a test sequence in progress:



Figure 2: SBExpress Manager Testing Progress

You may pause testing at any time during the test run. This will pause the currently running tests and will deduct the time paused from the total testing time. Use Unpause to resume testing or Stop to stop testing. Unpause will continue from the point the test was paused. Stop and Start will start the tests from the beginning.


To rerun any test, you must first clear the state of a test before restarting it. Clear will reset the test to the Idle state and remove the test results. Currently results are not archived, and therefore you must manually move any results you want to archive.

To act on any single test, for example to re-run a failed test alone, uncheck the All checkbox and select the individual test. Click Clear to clear the old results and then click Start to run it.

Examples


Here are some examples of warning and skipped from actual test runs


Test State – Warning


After selecting the Section 11 IOL NVMe Certification tests, the test engineer observes that the V12_IOL_NVMe_01.02_SetGetFeatures.sh script has been marked with a Warning status. Upon checking the report file the engineer finds the following warning message:

Wed May 27 15:13:48 2020 WARNING:

Set Features command passed but should have failed with CMD_SEQ_ERROR. The spec says 'If a Set Features command is issued for this feature after creation of any I/O Submission and/or I/O Completion Queues, then the Set Features command shall fail with status code of Command Sequence Error. The controller shall not change the value allocated between resets.'


Upon review with the development team, it was determined that this is a controller firmware issue that needs to be addressed.


Test State - Skipped


After selecting Section 06 Level 02 NVMe MI Full Command Set tests, the test engineer discovers that the SANBlaze supported MI-VDM tests have been marked as skipped.


Here is an example from the test report for the VMe_MI_VDM_ConfigurationGet.sh script.



SANBlaze automatically skipped this test instead of reporting it as a false positive or negative failure, saving the test engineer much test time.


Test State – Paused


/virtualun/webs/web/rest/sanblazes/1/ports/0/targets/100/luns/1/tests/1/Compare_Seq_64thr_4096blk.sh.log

Wed Jun 03 16:28:52 2020 DETAIL: Compare_Seq_64thr_4096blk.sh Starting

Wed Jun 03 16:28:52 2020 DETAIL: Compare_Seq_64thr_4096blk.sh Version=V1.2_2020/05/19

Wed Jun 03 16:28:52 2020 DETAIL: SANBlaze_Test_Include Version=V1.39_2020/05/25

Wed Jun 03 16:28:52 2020 DETAIL: SBIO_Include Version=V1.4_2020/05/23

Wed Jun 03 16:28:52 2020 DETAIL: System software version is V8.1-64-Beta5 built on May 28 2020 at 14:21:20

Wed Jun 03 16:28:52 2020 ACTION: Issuing command 'nvme-ns-mgmt /iport0/target100 show -all -timeout 90'

Wed Jun 03 16:28:52 2020 DETAIL: Namespace1: 976773168 Blocks, 500107862016 Bytes (465.76 GB), 512 Bytes/Block, Protection Type 0, Private, Attached

Wed Jun 03 16:28:53 2020 ACTION: sbecho WriteEnabled=1 > /proc/vlun/ioc0/sim/target100lun1

Wed Jun 03 16:28:53 2020 ACTION: sbecho Test: File=/virtualun/rest/scripts/IO_default_configuration Compare_1_0_100_1_1,64,4096,0,0,10,1,0,-1,0,1,1,1,0 > /proc/vlun/ioc0/sim/target100lun1

Wed Jun 03 16:30:10 2020 ACTION: sbecho PauseTests=Compare_1_0_100_1_1 > /proc/vlun/ioc0/sim/target100lun1

In this example the Test Engineer paused the Compare test to check the internal logs and then restarted the test.

0 views

© 2020 SANBlaze Technology, Inc.

  • LinkedIn Social Icon
  • YouTube Social  Icon
  • Twitter Social Icon