`ncu` "No kernels profiled"


I’m running my application with

ncu --target-processes all ./application args

And consistently getting

==PROF== Disconnected from process 89889
==WARNING== No kernels were profiled.

at the end of my application. I’m on Ubuntu 20.04 using Cuda version 11.1. I am certain that my GPU is being used (nvidia-smi reports activity correctly). I see lots of similar topics on these forums, but none of the fixes within help.

Additional points:

  • nsys profile ./application args seems to work fine.
  • I am using a V100
  • I cannot use Visual Studio but I’d like to be able to get specific GPU performance metrics for my application
  • nvprof similarly tells me that “No kernels were profiled”

Any next steps?


Can you share some more information about how your application executes? For example, does your application fork child applications with CUDA kernels in them? Is that the reason you’re using --target-processes all? Is there hand written CUDA in there, or some 3rd party library or a higher level framework like PyTorch?

Do you have access to the GUI? If so, you could launch an interactive profile to step through the APIs and see if you encounter a cuda kernel.

My application does not fork anything. I just added --target-processes all to make sure that I capture everything. There is a mix of hand-written CUDA, cublas, and even cusolver code in there. I have (crappy x11) access to the nvprof GUI, and I have seen that the kernels are executing as expected.

You mentioned x11, what’s the environment you’re using? Is this all happening locally on an Ubuntu box or are you ssh into the Ubuntu machine where the app and profile are running? Are there multiple devices (GPUs) on the target machine? Can you share the output of ‘nvidia-smi’?

I would recommend checking if you can profile a simple cuda sample. If you don’t have them installed, they are on github GitHub - NVIDIA/cuda-samples: Samples for CUDA Developers which demonstrates features in CUDA Toolkit. Something like cuda-samples/Samples/0_Introduction/matrixMul at master · NVIDIA/cuda-samples · GitHub is a good example to see if any profiling works at all.

I’m sshing into the Ubuntu machine with the GPU.

The output of nvidia-smi is:
Wed Sep 28 21:29:11 2022

| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-PCIE...  Off  | 00000001:00:00.0 Off |                  Off |
| N/A   28C    P0    35W / 250W |      0MiB / 16160MiB |      2%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |

Just tested matrixMul and it works fine with: nvprof --metrics all ./matrixMul (which is what I would want for my own application).

… now it seems fixed on my own application too? Honestly I’m super confused…

Does ncu work on your application or just nvprof?

Yeah, somehow it works now…? I’m sort of baffled.