Hello,
I read the documentation for inherited variables in CUpti_AcitivityKernel4 which go by:
-
uint64 CUpti_ActivityKernel4::start
-
uint64 CUpti_ActivityKernel4::end
I have a use case to find these timestamps for a function which is supposed to run multiple CUDA kernels inside it.
For example:
.
.
.
for(int i = 0; i < N; i++){
//want timestamps “start” and “end” for the CUDA kernels being called inside this function at each iteration
//of the for loop
funcWithMultipleCudaKernels(inputs);
}
.
.
.
.
.
An abstract definition of the function “funcWithMultipleCudaKernels(inputs)” looks similar to the following:
void funcWithMultipleCudaKernels(float inputs){
cudaMalloc()
cudaMemCpy()
kernel1<<<x,y,z>>>();
kernel2<<<x,y,z>>>();
.
…
.
.
cudaFree();
}
In addition to this kernel, it would also be helpful if I could get the profiling information of all CUDA activities like memcpy and allocation operations within the function funcWithMultipleCudaKernels so that I can get the execution time in total at each iteration.
Also, similar to execution time, I am looking for resources used by the kernels inside the function so I can have the metric for resources used by the function funcWithMultipleCudaKernels at each iteration in total, like number of registers per thread, or dynamic shared memory, etc.
Please provide your insights for the above.
Thank you
Lakshay