Guessing that 441.41 must be the closest compatible driver.
(Side quest: Had a bit of a struggle getting a driver to install. Ended up that I needed DCH driver type rather than standard.)
Was able to install driver 441.66 with CUDA 10.2. Seeing same result with Nsight Systems 2019.5.2 where it can see CPU side of things but no GPU results. “Incompatible CUDA driver version. Please try updating the CUDA driver or use more recent profiler version.”
And as another twist, Nsight Compute seems to run without any issues. I’m able to break on kernels and I get GPU utilization and analysis results in the report pane. So it looks like only Nsight Systems is borked.
Looks like the log misses some information we needed to investigate. Could you try another way - copy “nvlog.config” to the working directory of the application that you are profiling and collect another log file and share with us.
Here’s an abbreviated version of my path showing how I’m pointing to CUPTI lib. (I had to add that to path manually due to some other code / tool not being able to find CUPTI, though I can’t recall at the moment which code/tool needed it. Perhaps this is part of what’s going on.)
The environment variable should not cause this issue because we do not rely on it to find CUPTI library. We carry our own versions under Nsight Systems’ directory. However, you could try removing the additional CUPTI paths you added just in case. If that does not fix the issue, could you collect another log following same steps using Nsight Systems 2020.2 (i.e. our current latest version)?
Thanks for providing the log. We’ve been investigating it. Meanwhile, could you try profiling a simple NVIDIA sample app to verify if this issue is related to your target application? You can follow steps in CUDA Samples :: CUDA Toolkit Documentation to find and build samples. I suggest trying “0_Simple/vectorAdd”. If possible, please attach the log for the sample app also.
Something to be aware of… I’m invoking python using the numba library’s CUDA support, which builds CUDA kernels on the fly using LLVM and NVVM IR (I believe). Perhaps this is part of the issue. It’s curious that nvvp works fine but Nsight Systems does not, however. Since nvvp works, seems that it should be possible.
Maybe your team needs to play with some simple numba cuda samples to see what happens on your end?
Thanks for sharing the information. I am now able to reproduce this issue on my side using a python script with numba to generate CUDA kernels. We are looking into it.