NSIGHT COMPUTE not working on simple CUDA example

Hey, I’ve been trying to start using NSIGHT COMPUTE to get some insight on some of my programs and was testing on a simple example with the following command:

/usr/local/NVIDIA-Nsight-Compute/ncu --export “/home/padb/Desktop/test/CUDA/output_nsight” --force-overwrite --target-processes all --section SpeedOfLight --apply-rules yes “/home/padb/Desktop/test/CUDA/a.out”

I’m running Cuda compilation tools, release 10.1, V10.1.243 and the latest version of NSIGHT COMPUTE. The errors I get are the following

==PROF== Connected to process 25992 (/home/padb/Desktop/test/CUDA/a.out)
2 + 11 = 13
==PROF== Disconnected from process 25992
==WARNING== No kernels were profiled.

Am I doing something wrong or missing something? Thanks in advance

There aren’t actually any error reported in your output. Rather, Nsight Compute does not recognize any CUDA kernel launches in your application. Have you confirmed that you application does in fact launch its kernel by means other than source code inspection? Steps you can try to verify this are

  • Collect a runtime trace using Nsight Systems
  • Check the return values of all CUDA API calls in your code.
  • Step the API calls using Nsight Compute’s Interactive Profiling activity which breaks on and reports API errors.
  • Run through cuda-gdb with error reporting enabled.