Dear all;
I tried to measure the elapsed time of my kernel that sort an array using
float time_kernel;
cudaEvent_t start_event, stop_event;
cudaEventCreate(&start_event);
cudaEventCreate(&stop_event);
mykernel<<<1,1>>>();
cudaEventRecord(start_event, 0);
cudaEventRecord(stop_event, 0);
cudaEventSynchronize(stop_event);
cudaEventElapsedTime(&time_kernel, start_event, stop_event);
printf("kernel:\t\t% 15.8ef\n", time_kernel);
the code runs and gives time near to 2 ms , but when i increase the array size, its expected to increase the elapsed time, and i feel that time increase by watching the console window to have longer time in printing results. but also the time still near 2 ms.
when i run Nsight, the capture time id more than 2 ms as expected.
can I rely on capture time of Nsight to depressant the elapsed time?
and how i can fix the problem of 2 ms using the code above?