Measuring running time

Hi, I’m new to CUDA and I want to measure how much time a kernel function requires to run. I have this simple programs which takes as input an array and multiply each element MULVALUE times, where MULVALUE is a constant. The kernel call is the following

// i want to mesaure time between this point

squareArray <<< blockNumber, blockSize >>>(a_device, LENGTH);

// and this point

a_device is a float array and LENGTH is the array length

the kernel definition is the following

#define MULVALUE	3



__global__ void squareArray(float *a, int length)


  int idx = blockIdx.x * blockDim.x + threadIdx.x;

  if (idx < length)


	  for (int i = 0; i < MULVALUE; i++)

	a[idx] = a[idx] * a[idx];




__global__ void empty()



#endif // #ifndef _SQUARE_KERNEL_H_

How can I do that?

You should use cuda events if you want to measure the time spent by the GPU to execute your kernel.

Try something like

cudaEvent_t start, stop;

	float time;



	cudaEventRecord(start, 0);	


	cudaEventRecord(stop, 0);


	cudaEventSynchronize(stop); //Block until the event is actually recorded


	cudaEventElapsedTime(&time, start, stop);

	printf("Done in %f ms.\n", time);