Latest Nsight Systems and Nvidia Driver aren't compatible?

Reran the CUDA 10.2 install. It was wanting to install 441.22.

Looked through the historical driver installs. There’s no 441.22 that’s available to install. ::cry::

Guessing that 441.41 must be the closest compatible driver.

(Side quest: Had a bit of a struggle getting a driver to install. Ended up that I needed DCH driver type rather than standard.)

Was able to install driver 441.66 with CUDA 10.2. Seeing same result with Nsight Systems 2019.5.2 where it can see CPU side of things but no GPU results. “Incompatible CUDA driver version. Please try updating the CUDA driver or use more recent profiler version.”

I’m out of ideas.

And just to confirm, nvvp works fine with this same setup (441.66, CUDA 10.2, Visual Profiler v 10.2)

And as another twist, Nsight Compute seems to run without any issues. I’m able to break on kernels and I get GPU utilization and analysis results in the report pane. So it looks like only Nsight Systems is borked.

Hi jrjbertram, could you share your report files together with CUDA driver version (result of nvidia-smi) and Nsight Systems version?

Uploaded the report (renamed to have a .log extension so it would attach to the post.)

Report 19.qdrep.log (2.1 MB)

Sun May 03 20:37:16 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 441.66       Driver Version: 441.66       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   51C    P8     3W /  N/A |    182MiB /  8192MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1312    C+G   ...019.4\Monitor\Common\Nsight.Monitor.exe N/A      |
+-----------------------------------------------------------------------------+

Nsight systems version in this case is 2019.5.2.

Looking for any working combination that allows nsight systems to work. None of the combinations I have tried have worked on Windows 10.

Hi jrjbertram,

Thanks for your response. Looks like there is some issue with loading CUPTI library. Could you collect logs for us to investigate further?

You could turn on logging with following steps:

  1. Copy “nvlog.config.template” from "C:\Program Files\NVIDIA Corporation\Nsight Systems \host-windows-x64" to a place you want.

  2. Rename “nvlog.config.template” to “nvlog.config”.

  3. Set environment variable “NVLOG_CONFIG_FILE” to be the path of the “nvlog.config” file you just created.

  4. Run a normal collection, you should then be able to find a “nsight-sys.log” file under the same directory of “nvlog.config” file.

Please share the “nsight-sys.log” file you collected. Thanks!


liuyis

      Employee




    May 4

Hi jrjbertram,

Thanks for your response. Looks like there is some issue with loading CUPTI library. Could you collect logs for us to investigate further?

You could turn on logging with following steps:

  1. Copy “nvlog.config.template” from “C:\Program Files\NVIDIA Corporation\Nsight Systems \host-windows-x64” to a place you want.
  1. Rename “nvlog.config.template” to “nvlog.config”.
  1. Set environment variable “NVLOG_CONFIG_FILE” to be the path of the “nvlog.config” file you just created.
  1. Run a normal collection, you should then be able to find a “nsight-sys.log” file under the same directory of “nvlog.config” file.

Please share the “nsight-sys.log” file you collected. Thanks!


Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.

nsight-sys.log (1.31 MB)

Hi jrjbertram,

Looks like the log misses some information we needed to investigate. Could you try another way - copy “nvlog.config” to the working directory of the application that you are profiling and collect another log file and share with us.

Thanks!

Do I need to clear the environment variable as well?

Yes

Seeing a CUPTI error in the log now.

Here’s an abbreviated version of my path showing how I’m pointing to CUPTI lib. (I had to add that to path manually due to some other code / tool not being able to find CUPTI, though I can’t recall at the moment which code/tool needed it. Perhaps this is part of what’s going on.)

C:\Users\josh>echo %PATH%
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp;C:\Program Files\NVIDIA GPU Computing Too
lkit\CUDA\v10.2\extras\CUPTI\lib64;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Win
dows\System32\OpenSSH;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;

nsight-sys.log (74.6 KB)

Hi jrjbertram,

The environment variable should not cause this issue because we do not rely on it to find CUPTI library. We carry our own versions under Nsight Systems’ directory. However, you could try removing the additional CUPTI paths you added just in case. If that does not fix the issue, could you collect another log following same steps using Nsight Systems 2020.2 (i.e. our current latest version)?

Thanks.

Removing CUPTI lib dir from path made no difference.

Log from Nsight Systems 2020.2.1.

nsight-sys.log (43.6 KB)

Hi Josh,

Thanks for providing the log. We’ve been investigating it. Meanwhile, could you try profiling a simple NVIDIA sample app to verify if this issue is related to your target application? You can follow steps in CUDA Samples :: CUDA Toolkit Documentation to find and build samples. I suggest trying “0_Simple/vectorAdd”. If possible, please attach the log for the sample app also.

Thanks

64-bit debug build of sample app worked with Nsight Systems 2019.5.2, log attached.

Also confirmed that the release build of the sample app worked as well (log not attached).

nsight-sys.log (27.5 KB)

64-bit debug build of sample app also worked with Nsight Systems 2020.2.1, log attached.

nsight-sys.log (58.5 KB)

Something to be aware of… I’m invoking python using the numba library’s CUDA support, which builds CUDA kernels on the fly using LLVM and NVVM IR (I believe). Perhaps this is part of the issue. It’s curious that nvvp works fine but Nsight Systems does not, however. Since nvvp works, seems that it should be possible.

Maybe your team needs to play with some simple numba cuda samples to see what happens on your end?

  • Josh.

Hi Josh,

Thanks for sharing the information. I am now able to reproduce this issue on my side using a python script with numba to generate CUDA kernels. We are looking into it.

Best,
Liuyi

That’s great news, thank you!

Hi jrjbertram,

We have finally identified the issue. Our next release (2021.2) will contain a fix for this bug. Thanks again for reporting it.

Best,
Liuyi