Timing Data from Nsight Systems (from the sqlite report)

Hello

I have an executable which I am using to extract the data related to runtime of all the CUDA kernels that are involved in my application.

Looking at the data, I see that there are many of your internal kernels (for example pooling_fw_4d_kernel, offset_vector_kernel(float*, float*, float*, long, long, long, bool, int, int), etc.).

My question here is: If I have my kernel, say “myKernel” being invoked in the application at some point, and the data regarding its start and end (time in ns) can be found; does this start and end time take into account the internal kernels being used?

Also, in the CUPTI_ACTIVITY_KIND_KERNEL table from the sqlite report, I see that for any of my own kernels in my application appears twice in the table StringIds (when I try to match the name of the kernel with the string id shortName). Like I would see something like this:

In the table StringIds:

id || value

789 || myKernel(int* , double const*)
790 || myKernel

From above snippet, which entry do I use to get exact information about the time to get the runtime of the kernel that I am looking for.

If I am looking for the start point in runtime of my application where myKernel started execution, including all the internal kernel calls that cuDNN or cuBLAS might have made, till the end when the kernel and all related activities to it were stopped (the end time of that kernel), which entry should I be using?

Please point me to the correct resource for this!

Much appreciated!

Thanks
Lakshay

My question here is: If I have my kernel, say “myKernel” being invoked in the application at some point, and the data regarding its start and end (time in ns) can be found; does this start and end time take into account the internal kernels being used?

Yes. The start and end times are “point in time” values from a continuously running realtime clock that mark the entry and exit times of the kernel. As such, the duration they represent (i.e. “end - start”) includes execution of internal kernels. It also includes time when a kernel may be blocked or waiting on other resources.

Also, in the CUPTI_ACTIVITY_KIND_KERNEL table from the sqlite report, I see that for any of my own kernels in my application appears twice in the table StringIds […] which entry do I use to get exact information about the time to get the runtime of the kernel that I am looking for.

There are two columns of the CUPTI_ACTIVITY_KIND_KERNEL table that map to the StringIds table. One is shortName and the other is demangledName. As the names imply, shortName is the most reduced version of the kernel name, while demangledName shows the full specification, including any template types. This means each row in the CUPTI_ACTIVITY_KIND_KERNEL maps to (typically) two different rows in the StringIds table. So in your example data here, StringIds.id = 789 likely maps to the shortName of your kernel execution trace(s), while .id = 790, with the more detailed information, likely maps to the demangledName of the same row(s). If there are multiple kernel executions using the same entry point, you’ll see multiple rows in the CUPTI_ACTIVITY_KIND_KERNEL table that map to the same rows of the StringIds table.

Basically, there are two parallel “many-to-one” mappings between the CUPTI_ACTIVITY_KIND_KERNEL and the StringIds table. So I think the answer to your question is “both” or “either.”

If I am looking for the start point in runtime of my application where myKernel started execution, including all the internal kernel calls that cuDNN or cuBLAS might have made, till the end when the kernel and all related activities to it were stopped (the end time of that kernel), which entry should I be using?

Understanding that there are two rows in StringIds you might use either or both, depending on the level of filtering you need. To see the two tables fully joined together, you might use a query like this:

SELECT
  kern.*,
  sid_sn.value AS shortNameStr,
  sid_dn.value AS demangledNameStr
FROM
  CUPTI_ACTIVITY_KIND_KERNEL AS kern
JOIN
  StringIds AS sid_sn
  ON sid_sn.id = kern.shortName
JOIN
  StringIds AS sid_dn
  ON sid_dn.id = kern.demangledName
;