Using CUpti_ActivityKernel4 to find the start and end time in ns for a kernel wrapped in a function

Hello,

I read the documentation for inherited variables in CUpti_AcitivityKernel4 which go by:

  1. uint64 CUpti_ActivityKernel4::start

  2. uint64 CUpti_ActivityKernel4::end

I have a use case to find these timestamps for a function which is supposed to run multiple CUDA kernels inside it.

For example:

.
.
.
for(int i = 0; i < N; i++){

//want timestamps “start” and “end” for the CUDA kernels being called inside this function at each iteration
//of the for loop

funcWithMultipleCudaKernels(inputs);

}
.
.
.
.
.

An abstract definition of the function “funcWithMultipleCudaKernels(inputs)” looks similar to the following:

void funcWithMultipleCudaKernels(float inputs){

cudaMalloc()
cudaMemCpy()

kernel1<<<x,y,z>>>();
kernel2<<<x,y,z>>>();
.

.
.
cudaFree();
}

In addition to this kernel, it would also be helpful if I could get the profiling information of all CUDA activities like memcpy and allocation operations within the function funcWithMultipleCudaKernels so that I can get the execution time in total at each iteration.

Also, similar to execution time, I am looking for resources used by the kernels inside the function so I can have the metric for resources used by the function funcWithMultipleCudaKernels at each iteration in total, like number of registers per thread, or dynamic shared memory, etc.

Please provide your insights for the above.

Thank you
Lakshay

Basically, I am looking for an example on how to do what I mentioned above. A code snippet that can help me would be great.

Thanks
Lakshay

Can anyone from NVIDIA have a look at this please?

Thanks

Lakshay

All the information you are looking for can be collected using Nsight Systems or nvprof.

It can also be collected using CUPTI. But it requires writing more code. Refer the CUPTI sample activity_trace_async.

Thank you for your reply.

Can you please tell me where I can find the libcudnn.so.7? I am using CUDA 10.1 on Linux and it appears that the location of this library file has changed. Because of this, I am not being able to profile a single program.

Please let me know!

Thanks
Lakshay

Which profiling tool are you using?
Check your LD_LIBRARY_PATH setting. The “/usr/local/cuda-10.1/lib64” directory should be included.
Does your application run correctly without profiling?
Please provide the exact error message you are getting.

Hello

Starting CUDA 10.1 the location of libcudnn.so.7 has changed. The path you are mentioning does not contain libcudnn.so.7

Please confirm and let me know, because I have seen the link which said this in the past, but I am unable to find that link again.

Thanks
Lakshay

Note that cuDNN is not part of the CUDA Toolkit installer. You need to install cuDNN separately. The location for libcudnn.so.7 has not changed.