Tools nvprof and nsys don’t support tracing of dynamic parallelism (CDP) kernels for Volta (compute capability 7.0) and higher GPU architectures.
In the CUDA releases prior to version 11.4 , these tools error out early when CUDA module contains CDP kernels even when it is not launched. In CUDA 11.4, an improvement is made to trace all the host kernels until a CDP kernel is encountered. This is documented in the Profiler Known Issues section of the CUDA Profiler guide.
CDP kernel launch tracing has a limitation for devices with compute capability 7.0 and higher. CUPTI traces all the host launched kernels until it encounters a host launched kernel which launches child kernels. Subsequent kernels are not traced.
If I change the GPU to Tesla P40 which is Pascal(compute capability 6.1), will it work?
Tools nvprof and nsys support tracing of dynamic parallelism (CDP) kernels for Pascal (compute capability 6.1), right?
Does this feature have any requirements for the CUDA version?
Yes, these tools support tracing of the CDP kernels for Pascal and older GPU architectures. There is no specific requirements for the CUDA version. The CUDA version you use i.e. 11.1 should work.
Hi, I have the same limitation with a RTX A4500. I’m under cuda 11.7 and I use the last nsight system release (2023.3.1). Should I upgrade cuda or any think else? Does it still a limitation today?