I have attempted to profile a CUDA kernel on an L20 GPU using Nsight Compute, but it failed.
ncu -o vecAdd_profile ./Samples/0_Introduction/vectorAdd/vectorAdd
[Vector addition of 50000 elements]
==PROF== Connected to process 42863 (/data/shuren/code/cuda-samples/Samples/0_Introduction/vectorAdd/vectorAdd)
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==ERROR== Failed to prepare kernel for profiling==ERROR== Unknown Error on device 0.
==ERROR== Failed to profile “vectorAdd” in process 42863
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
My cuda toolkit and driver are latest:
and the kernel can run correctly without ncu.
Does the nsight compute support the L20 gpu or is there something wrong on my env?