Hi, I’m new to CUDA and I want to measure how much time a kernel function requires to run. I have this simple programs which takes as input an array and multiply each element MULVALUE times, where MULVALUE is a constant. The kernel call is the following
// i want to mesaure time between this point
squareArray <<< blockNumber, blockSize >>>(a_device, LENGTH);
// and this point
a_device is a float array and LENGTH is the array length
the kernel definition is the following
#define MULVALUE 3
#ifndef _SQUARE_KERNEL_H_
#define _SQUARE_KERNEL_H_
__global__ void squareArray(float *a, int length)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < length)
{
for (int i = 0; i < MULVALUE; i++)
a[idx] = a[idx] * a[idx];
}
}
__global__ void empty()
{
}
#endif // #ifndef _SQUARE_KERNEL_H_
How can I do that?