I want to measure the time elapsed IN a kernel, not how long the kernel execution overall is.
Something like this:
some_kernel(){
start timer
do_something1
<-----measure the time elapsed till here------->
do_something2
<-----measure the time elapsed till here------->
do_something3
<-----measure the time elapsed till here------->
}
I somehow had in the back of my head that there was a CUDA function for this already, but apparently missed it when I quickly scanned the Programming Guide appendices to check.
I somehow had in the back of my head that there was a CUDA function for this already, but apparently missed it when I quickly scanned the Programming Guide appendices to check.