Nsys Does not show breakdown of kernel

By default nsys seems to only provide profiling for the global kernel being called.

Let’s say if I have a global kernel k1, which in turn calls some device kernels k2,k3,k4. Is there an option that I can give nsys to let it produce a breakdown of k1 as to how long each of k2,k3,k4 is running for?

Nsys is not set up to do this.

@mjain is this something that can be done by invoking CUPTI directly?

CUPTI doesn’t provide timing information for device functions. Please check if clock() or clock64() functions provided by CUDA helps you. Documentation of these functions is available at Programming Guide :: CUDA Toolkit Documentation

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.