I am trying to compare the time taken by cuda threads, I have a 1000000 floats array and a 1000000 clock_t array, I tried the following way using clock_t to count no. of clock cycles:
the cudaEvent is used inside the main (host) function (i.e. to measure the time taken by a cuda call for example), while what I am looking for is the time taken by each individual thread, even inside the same block.
I took a look at the clock example, they’re doing the same thing that I am doing, so I guess it’s correct, I just have to understand what do the numbers represent.
I understand, but in my case, I need to measure the time taken by each individual thread (some threads should take a lot more time than the others), so I don’t need the max-min step.