VS2019 CUDA 10.1 unable to get profile report with Nsight

I just bought a new Win 10 computer with an RTX 2070, installed VS 2019 and the latest CUDA toolkit (10.1).
My CUDA project builds and runs perfectly, so I want to profile it. Pain!

  1. I tried to do ‘Profile CUDA application’ under the VS IDE, but it says the RTX 2070 is unsupported for the Nsight profiler. Huh? The latest profiler does not support one of the latest GPUs?

  2. Still under the VS IDE I selected ‘Profile CUDA application with Nsight Compute’. That ran and apparently profiled because my app ran about 100 times slower. But no results appeared. I looked all over the IDE interface and all over the hard drive, but nothing. No report.

  3. I switched to Nsight Compute outside the IDE. First, I tried ‘Interactive profile’. Again the program ran very slowly, obviously being profiled. But when I exited my program, again there was no report. It was as if I had not run anything.

  4. I switched to just ‘Profile’. It ran very slowly again, apparently profiling. But when I exited my program, I got an error message “The profiler returned an error code 1. Failed to load report. Could not open file.”

So what do I have to do to get a profile of my program? The debugger works perfectly on it, stopping at breakpoints and single-stepping, and it seems that the profiler is running. Just no report!

Thanks for any hints!

I also got stuck in this problem

I use nvprof to get the profile of my file, but an error occur:

======== Warning: This version of nvprof doesn’t support the underlying device, GPU profiling skipped
======== Error: Application returned non-zero code -1

I encountered the same problem. My platform is the following:
RTX 2060 (TU106 - laptop) which is supported (according to Nsight Compute Release Notes)
CUDA toolkit 10.1
Visual Studio 2015
Nsight Compute 2019.4.0 (running with admin privileges)

I have a desktop system with VS 2019, Win 10, CUDA 10.1U2 (10.1.243) and RTX 2070. I did not install any separate version of nsight compute, just using the version that was installed with CUDA 10.1U2.

One of the problems you can run into with profiling is covered here:

https://developer.nvidia.com/nvidia-development-tools-solutions-ERR_NVGPUCTRPERM-permission-issue-performance-counters

To work around this, I started by launching a command prompt in windows 10 with the run as administrator option. This isn’t specific to CUDA, please just google for instructions if you need that.

From this administrator command prompt I typed

nv-nsight-cu

to start the GUI.

At the opening dialog, I selected the “Continue” button in the “Quick Launch” area.

In the next dialog, in the “Target Platform” … “Application Executable” field I entered the app, including full path/full name. The profiler seems to be sensitive to entering the correct format here, so the best way I found to do that is to use the … button to navigate to the app to be profiled, then select it that way.

In the lower half of the dialog, in the “Activity” area, I chose to Profile, not Interactive Profile

In this case its mandatory to enter a file name for the profiler data file. Here again I suggest using the … button to navigate to the directory where you want this file to be stored. Then enter the file name. The profiler will add to this filename when it actually creates the file.

At this point you should be able to click the Launch blue button. If everything is working, the command prompt will show the program being executed, and then the profiler will process the output and show a window like this:

I recommend starting with a fairly simple application to learn the mechanics. The default profiler settings may run your app a number of times to collect the requested data. You can reduce the requested data from the default selections. The app I used here was the default app you get if you start a new CUDA runtime project in VS2019.

This is not intended to be a full tutorial. You can find more information in this blog:

https://devblogs.nvidia.com/using-nsight-compute-to-inspect-your-kernels/

as well as the nsight compute documentation:

https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#quick-start