Hi all,
I am wondering if there is way to get the number of instructions executed by each thread. The profiler from CUDA only gives the total number of instructions (i.e. by all threads), not per-thread. I also checked CUPTI, and found nothing. Thanks!
Bo