Latest Nsight Systems and Nvidia Driver aren't compatible?

Hi… trying to use Nsight Systems to graph CPU and GPU. Previously got it to work with nvvp including with nvtx annotations, trying to “upgrade” to the newer tool set. Found that Nsight Systems 2020.1 wasn’t able to get info about GPU… a warning in Nsight Systems said to update the driver. I updated via Geforce Experience to latest driver (nvidia-smi shows 445.87 and CUDA 11.0) and updated Nsight Systems to 2020.2. With this configuration, Nsight Systems can’t connect to the application at all, also reports it is not compatible with CUDA 11.0. What is the right combination of driver + nsight systems that should work on Windows 10?

Thanks,

  • Josh.

Additional data. I reverted my driver version in Windows 10 device manager. nvidia-smi shows driver 442.23 and cuda version 10.2. With Nsight Systems 2020.2 I observe I can get CPU details, but no GPU details. Error in Nsight Systems is "
Incompatible CUDA driver version. Please try updating the CUDA driver or use more recent profiler version."

Found advanced search that shows historical geforce game ready driver releases, but the descriptions don’t show the corresponding CUDA version of each driver. SMH. Couldn’t find a table anywhere that shows CUDA versions for each driver release.

Attempted to reinstall CUDA 10.2, which appeared that it should reinstall driver and nvidia cuda tools to specific versions, which should therefore all be compatible. I didn’t catch the exact driver version it was going to install, but noted that it was earlier than 442.23. After CUDA installation, found that nvidia-smi still shows 442.23, so I presume that the CUDA install chose not to install an earlier version of the driver.

CUDA 10.2 installed Nsight Systems 2019.5.2. With this version and driver 442.23, I can get GPU memory usage, but do not have any info on GPU kernels. I do see NVTX annotations however. Nsight Systems error is “Incompatible CUDA driver version. Please try updating the CUDA driver or use more recent profiler version.”

Reran the CUDA 10.2 install. It was wanting to install 441.22.

Looked through the historical driver installs. There’s no 441.22 that’s available to install. ::cry::

Guessing that 441.41 must be the closest compatible driver.

(Side quest: Had a bit of a struggle getting a driver to install. Ended up that I needed DCH driver type rather than standard.)

Was able to install driver 441.66 with CUDA 10.2. Seeing same result with Nsight Systems 2019.5.2 where it can see CPU side of things but no GPU results. “Incompatible CUDA driver version. Please try updating the CUDA driver or use more recent profiler version.”

I’m out of ideas.

And just to confirm, nvvp works fine with this same setup (441.66, CUDA 10.2, Visual Profiler v 10.2)

And as another twist, Nsight Compute seems to run without any issues. I’m able to break on kernels and I get GPU utilization and analysis results in the report pane. So it looks like only Nsight Systems is borked.

Hi jrjbertram, could you share your report files together with CUDA driver version (result of nvidia-smi) and Nsight Systems version?

Uploaded the report (renamed to have a .log extension so it would attach to the post.)

Report 19.qdrep.log (2.1 MB)

Sun May 03 20:37:16 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 441.66       Driver Version: 441.66       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   51C    P8     3W /  N/A |    182MiB /  8192MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1312    C+G   ...019.4\Monitor\Common\Nsight.Monitor.exe N/A      |
+-----------------------------------------------------------------------------+

Nsight systems version in this case is 2019.5.2.

Looking for any working combination that allows nsight systems to work. None of the combinations I have tried have worked on Windows 10.

Hi jrjbertram,

Thanks for your response. Looks like there is some issue with loading CUPTI library. Could you collect logs for us to investigate further?

You could turn on logging with following steps:

  1. Copy “nvlog.config.template” from “C:\Program Files\NVIDIA Corporation\Nsight Systems \host-windows-x64” to a place you want.

  2. Rename “nvlog.config.template” to “nvlog.config”.

  3. Set environment variable “NVLOG_CONFIG_FILE” to be the path of the “nvlog.config” file you just created.

  4. Run a normal collection, you should then be able to find a “nsight-sys.log” file under the same directory of “nvlog.config” file.

Please share the “nsight-sys.log” file you collected. Thanks!


liuyis

      Employee




    May 4

Hi jrjbertram,

Thanks for your response. Looks like there is some issue with loading CUPTI library. Could you collect logs for us to investigate further?

You could turn on logging with following steps:

  1. Copy “nvlog.config.template” from “C:\Program Files\NVIDIA Corporation\Nsight Systems \host-windows-x64” to a place you want.
  1. Rename “nvlog.config.template” to “nvlog.config”.
  1. Set environment variable “NVLOG_CONFIG_FILE” to be the path of the “nvlog.config” file you just created.
  1. Run a normal collection, you should then be able to find a “nsight-sys.log” file under the same directory of “nvlog.config” file.

Please share the “nsight-sys.log” file you collected. Thanks!


Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.

nsight-sys.log (1.31 MB)

Hi jrjbertram,

Looks like the log misses some information we needed to investigate. Could you try another way - copy “nvlog.config” to the working directory of the application that you are profiling and collect another log file and share with us.

Thanks!

Do I need to clear the environment variable as well?

Yes

Seeing a CUPTI error in the log now.

Here’s an abbreviated version of my path showing how I’m pointing to CUPTI lib. (I had to add that to path manually due to some other code / tool not being able to find CUPTI, though I can’t recall at the moment which code/tool needed it. Perhaps this is part of what’s going on.)

C:\Users\josh>echo %PATH%
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;C:\Program Files\NVIDIA GPU Computing Too
lkit\CUDA\v10.2\extras\CUPTI\lib64;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Win
dows\System32\OpenSSH;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;

nsight-sys.log (74.6 KB)

Hi jrjbertram,

The environment variable should not cause this issue because we do not rely on it to find CUPTI library. We carry our own versions under Nsight Systems’ directory. However, you could try removing the additional CUPTI paths you added just in case. If that does not fix the issue, could you collect another log following same steps using Nsight Systems 2020.2 (i.e. our current latest version)?

Thanks.

Removing CUPTI lib dir from path made no difference.

Log from Nsight Systems 2020.2.1.

nsight-sys.log (43.6 KB)

Hi Josh,

Thanks for providing the log. We’ve been investigating it. Meanwhile, could you try profiling a simple NVIDIA sample app to verify if this issue is related to your target application? You can follow steps in https://docs.nvidia.com/cuda/cuda-samples/index.html#building-samples to find and build samples. I suggest trying “0_Simple/vectorAdd”. If possible, please attach the log for the sample app also.

Thanks

64-bit debug build of sample app worked with Nsight Systems 2019.5.2, log attached.

Also confirmed that the release build of the sample app worked as well (log not attached).

nsight-sys.log (27.5 KB)

64-bit debug build of sample app also worked with Nsight Systems 2020.2.1, log attached.

nsight-sys.log (58.5 KB)

Something to be aware of… I’m invoking python using the numba library’s CUDA support, which builds CUDA kernels on the fly using LLVM and NVVM IR (I believe). Perhaps this is part of the issue. It’s curious that nvvp works fine but Nsight Systems does not, however. Since nvvp works, seems that it should be possible.

Maybe your team needs to play with some simple numba cuda samples to see what happens on your end?

  • Josh.

Hi Josh,

Thanks for sharing the information. I am now able to reproduce this issue on my side using a python script with numba to generate CUDA kernels. We are looking into it.

Best,
Liuyi