how to measure the time elapsed (or no. of clock cycles) between the start and the end of a cuda thr

Hi everyone,

I am trying to compare the time taken by cuda threads, I have a 1000000 floats array and a 1000000 clock_t array, I tried the following way using clock_t to count no. of clock cycles:

__global__ kernel (...........,clock_t* d_clocks){

int idx=..........;

clock_t start=clock();




then I copy the values of d_clocks to a host array called clocks after calling cudaThreadSynchronize() just after the kernel call.

It outputs big numbers at the begining, but after idx=1009 it always outputs 1 !

and is there a way to measure the time inside a kernel in milliseconds in debug mode not EmuDebug mode?


The CUDA SDK contains examples that show how to do time measurement.

Look at cudaEvent in user’s manual.

See clock sample to know how to use.

You need to calculate start time and end time of each thread and find min and max value from those. final value is max - min.

the cudaEvent is used inside the main (host) function (i.e. to measure the time taken by a cuda call for example), while what I am looking for is the time taken by each individual thread, even inside the same block.

I took a look at the clock example, they’re doing the same thing that I am doing, so I guess it’s correct, I just have to understand what do the numbers represent.

I understand, but in my case, I need to measure the time taken by each individual thread (some threads should take a lot more time than the others), so I don’t need the max-min step.

Thanks all, I’ll try again to figure out the meaning of the numbers.