Measuring running time

Hi, I’m new to CUDA and I want to measure how much time a kernel function requires to run. I have this simple programs which takes as input an array and multiply each element MULVALUE times, where MULVALUE is a constant. The kernel call is the following

// i want to mesaure time between this point

squareArray <<< blockNumber, blockSize >>>(a_device, LENGTH);

// and this point

a_device is a float array and LENGTH is the array length

the kernel definition is the following

#define MULVALUE	3

#ifndef _SQUARE_KERNEL_H_

#define _SQUARE_KERNEL_H_

__global__ void squareArray(float *a, int length)

{

  int idx = blockIdx.x * blockDim.x + threadIdx.x;

  if (idx < length)

  {

	  for (int i = 0; i < MULVALUE; i++)

	a[idx] = a[idx] * a[idx];

  }

	  

}

__global__ void empty()

{

}

#endif // #ifndef _SQUARE_KERNEL_H_

How can I do that?

You should use cuda events if you want to measure the time spent by the GPU to execute your kernel.

Try something like

cudaEvent_t start, stop;

	float time;

	cudaEventCreate(&start);

	cudaEventCreate(&stop);

	cudaEventRecord(start, 0);	

	hereYourKernel<<<...>>>(...);

	cudaEventRecord(stop, 0);

				

	cudaEventSynchronize(stop); //Block until the event is actually recorded

	

	cudaEventElapsedTime(&time, start, stop);

	printf("Done in %f ms.\n", time);

Regards,

Daniele