NVVS deprecated?

Newer releases of DCGM indicate that NVVS has been deprecated and to use dcgmi diag instead. This is unfortunate, as many of the tests that can be run directly are now hidden and we only receive “Pass/Fail” for those tests.
I require the ability to run a power/thermal stress tests, PCIe bandwidth test, and other NVVS tests - and the only publicly available tool I know about for doing this is NVVS.
If the specific tests available through NVVS are not going to be supported going forward, I will have to develop my own equivalent, or (hopefully) NVIDIA will open-source NVVS. This code is not rocket science, so developing equivalent tests is not a huge undertaking, but it does seem like a wasted effort since NVIDIA already wrote it.
Please advise as to which course of action I should take to retain this capability.

Thanks!
Doug

I would appreciate a reply from someone at NVIDIA as to your support plans for NVVS going forward. Thanks!

Hi Doug,

What has been deprecated is the standalone usage of NVVS. All the functionality has been subsumed into DCGM (called Diagnostics) and is available via “dcgmi diag” as you correctly noted. There is a 1-1 mapping of all the NVVS CLI options to dcgmi. Please see the documentation here: Data Center GPU Manager User Guide :: GPU Deployment and Management Documentation

You can continue to specify which tests you want to run as part of DCGM Diagnostics. We also have verbose options and logging that will provide more information on the tests - you can also use configuration files to customize the parameters of the tests (same workflow as NVVS).

Are there specific issues or deficiencies you are observing with “dcgmi diag”? Please report these and we will be happy to investigate and will attempt to address in upcoming releases.

Thanks!

Thank you for your response. I have gone through this page, and that is what prompted me to ask the question to this forum.
The bulk of the document describes how to use nvvs, with that section preceded by the statement that it has been deprecated. The’dcgmi diag’ command has an option for a config file, but there is no documentation on what a dcgmi config file looks like. I tried to use some of the YAML config files I created for nvvs, but it appears that the -c parameter is totally ignored in dcgmi diag.
I’m running version 1.6.3, BTW.

Is it possible that the nvvs features have not yet been subsumed? I want to continue to be able to run targeted stress, targeted power and memory/pci bandwidth tests, and I’m hoping the YAML config files I’ve written will still work.

Thanks!
Doug

Hi Doug,

>>> appears that the -c parameter is totally ignored in dcgmi diag

This is not what we observed in our testing and our test plans includes coverage for this option. Would it be possible for you to share a few of your current NVVS YAML configs that you have, so we can investigate?

If you prefer, you can attach these via a bug (My Account → My Bugs → Submit a New Bug). Please let me know the bug number and I will route it internally.

Thanks for your help!
Pramod