Hi,
I’m trying to know how many time my kernel takes, using the following code:
//START CLOCK COUNTER
clock_t timer_start = clock();
//ENTER KERNEL
my_kernel <<< dimGrid, dimBlock >>> (args);
checkCUDAError("kernel");
//EXIT KERNEL
//STOP AND PRINT THE TIMER
cudaThreadSynchronize();
timer_end = clock();
printf("start cicles : %ld\n", timer_start);
printf("end cicles : %ld\n", timer_end);
timer_diff = static_cast<long double>( (timer_end - timer_start + 0.0) / CLOCKS_PER_SEC);
printf("my Kernel, iteration %d: %.12Lf seconds\n",i, timer_diff);
but the result I get is almost always the same number of cycles for timer_start and timer_end, so the last line is usually 0.000…0
In some more strange cases, there’s a difference of 1000 cycles, between start and stop, so the last line is 0.010…0
I’m sure there’s something wrong, because it is impossible to not change at least 1 clock between the kernel execution.
The usual response I get is this:
start cicles : 120000
end cicles : 120000
Kernel, iteration 0: 0.000000000000 seconds
start cicles : 130000
end cicles : 130000
Kernel, iteration 1: 0.000000000000 seconds
start cicles : 130000
end cicles : 130000
Kernel, iteration 2: 0.000000000000 seconds
start cicles : 130000
end cicles : 130000
Kernel, iteration 3: 0.000000000000 seconds
start cicles : 130000
end cicles : 130000
Kernel, iteration 4: 0.000000000000 seconds
Can anybody help me?
Thanks in advance