Programmatically obtain kernel summary

Is there way to programmatically provide basic information of a running kernel? For example, the summary columns of Nsight Compute:


Particularly the Cycles count and Registers column.

Or Kernel summary from Nsight System

I tried looking into CUPTI, but does not seem to find any relevant API.

Short of this, can one obtain the Nsight System summary of kernels using nsys command (i.e. without the GUI). We are trying to build performance metrics off values like local memory usage.

you can get some information from cudaFuncGetAttributes, also see here and here, however I don’t know if that is really what you are asking for (it has nothing to do with running kernels, per se, it’s based on a kernel you ask for by name, and I don’t know all the information you are asking for - it certainly does not have all the information the profiler provides and does not include cycles count, for example). If your question is about usage of either nsys or cupti, we have separate forums for those, I recommend asking on one of those forums.

This information is available through CUPTI Activity API via the CUpti__ActivityKernel9 (or earlier) records. This record is provided after the completion of the grid.

Latency and Launched from thread are from the CUpti_ActivityApi record. This is done by matching the API api_record.correlationId to the kernel9_record.correlationId.

Thank you all for the answers.