I want to profile the instruction count of CUDA program for every certain intervals.
I found out that I can use Nsight, but I need to build my own profiling tool to merge some other data.
I think CUPTI can do this, but I don’t know whether CUPTI allows to profile the other process.(profiling and cuda kernel run on different program)
So is there any way to profile the instruction count of cuda kernel dynamically ?