No kernels were profiled warning/problem


I’m trying to profile my application on a dgx box on the 3rd (counting from 0) V100 contained within. When running, I get the warning no kernels were profiled. Any ideas what’s going on? This is with Cuda 10.0 and Ubuntu with the 4.4.0 kernel. I’m fairly sure that the related .cu file was compiled with -G, but I’m under the impression that the kernel is profilable (at a high-level) either way. The command and response follows the signature.


myuser@dgx-test:~/r/my_dir$ /usr/local/NVIDIA-Nsight-Compute-2019.4/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli --devices 3 --export "/home/myuser/r/my_dir/nsight_compute_prof1" --force-overwrite --target-processes all --kernel-regex my_kernel_name_copy_pasted --kernel-regex-base function --launch-skip-before-match 0 --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section Occupancy --section SchedulerStats --section SourceCounters --section SpeedOfLight --section WarpStateStats --sampling-interval auto --sampling-max-passes 5 --sampling-buffer-size 33554432 --nvtx --profile-from-start 1 --clock-control base --apply-rules "/home/myuser/r/my_dir/my_binary" arg1 arg2 arg3
<output indicating the process is running>
==PROF== Connected to process 3442
my_particular_test: PASS (latency XXX things/s)
1 test passed
==PROF== Disconnected from process 3442
==WARNING== No kernels were profiled

My assumption would be that the combination of filters you are using causes no kernels in your application to match. Either the kernel “my_kernel_name_copy_pasted” is not running on device 3, or the name simply doesn’t match.

My suggestion would be to start with a simpler command line, since most of the parameters you are passing match the defaults anyways and likely aren’t necessary in your case. Start with

myuser@dgx-test:~/r/my_dir$ /usr/local/NVIDIA-Nsight-Compute-2019.4/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli --devices 3 --export “/home/myuser/r/my_dir/nsight_compute_prof1” --force-overwrite --target-processes all --section SpeedOfLight --apply-rules “/home/myuser/r/my_dir/my_binary” arg1 arg2 arg3

and see if that matches your kernel. If that works, check the kernel name that is shown and try adding back the kernel name filter.

–kernel-regex my_kernel_name_copy_pasted

Thanks for the idea @felix_dt!

As background, the way I generated that long command was by using the nsight gui. BTW, it’s pretty nice that the gui actually shows me the command it’s about to run, though I had to modify it manually to restrict execution to device 3.

Even without the kernel restriction, I still receive the warning of “no kernels were profiled”.

Since posting this, I’ve learned that sometimes nsight/nvvp/nvprof struggles to profile on any device other than the default device. I don’t really have access to device 0 on this machine, but does that sound like it could be part of the issue? My guess is that the nsight developers tend to develop for “device 0” and so are less likely to have completely tested the non-0 devices.

BTW, here’s my environment, and I’m able to confirm via nvidia-smi that the program is indeed running on Device 3.

myuser@dgx-test:~/$ env | grep "DEVICE"

I think the problem here might be the combination of CUDA_VISIBLE_DEVICES together with --devices.

Using the environment variable, you are instructing the CUDA driver that there should be only one device visible to CUDA applications (device 3 in your system), which will be made available to CUDA as the first device, i.e. device 0.

Using the --devices 3 options, you are instructing Nsight Compute to restrict profiling to the fourth device (the one with ID 3), but there aren’t four devices anymore at that point.

If you really only want your application to run on CUDA_VISIBLE_DEVICES=3, there is no need for the --devices option (or it can be set to 0). If you want your application to run on all devices, but only profile on device with ID 3, remove the CUDA_VISIBLE_DEVICES env variable and keep the --devices 3 option

Ah ha, this is good insight to have, as it would be difficult for me to track this down on my own. I’m giving it a whirl at the moment, but there’s an unrelated issue that’s keeping me from running the test. I’ll give it another try in the morning and report back. Thanks!

@felix_dt was able to run the test just now. The test itself passes, but the profiler is having issues:

my_kernel: ==ERROR== Error: ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see

This is what motivated me to use the --devices 3 option.

I don’t have sudo access on this machine. I’ve never needed sudo to profile in the past, so I’m surprised I need it. However, that link states that it’s a relatively new requirement from the 418.43 driver, and a lot of my experience is with a driver that’s slightly older than that. I guess I’ll see if it makes sense for me to gain sudo access…

Seeing this error message now is actually a good thing, as it implies that the profiler is now finding a kernel (on your physical device 3, CUDA device 0). As you found, it’s a new requirement from the driver, and you will need to work with your machine owner to get access again using one of the options listed on the page

  • run the profiler as root/sudo
  • temporarily load the kernel module with NVreg_RestrictProfilingToAdminUsers=0
  • permanently enable profiling for non-admin users with a file in /etc/modprobe.d