Thanks for the quick reply. However, I tested with a simple program that use cuLaunchKernel driver API. The nsys reported directly cuLaunchKernel instead of cudaLaunchKernel. I think then the cudaLaunchKernel shows in the report should all belong to the Runtime API right?
I tested with a simple program that use cuLaunchKernel driver API. The nsys reported directly cuLaunchKernel instead of cudaLaunchKernel.
That’s what I meant - if you use driver API like cuLaunchKernel directly, then Nsys will capture the driver API cuLaunchKernel and show in the report. But if a driver API is invoked by a runtime API under the hood, then Nsys will skip it - so if you call cudaLaunchKernel, it will only capture cudaLaunchKernel rather than both cudaLaunchKernel and cuLaunchKernel.