I tried to measure the elapsed time of my kernel that sort an array using
float time_kernel; cudaEvent_t start_event, stop_event; cudaEventCreate(&start_event); cudaEventCreate(&stop_event); mykernel<<<1,1>>>(); cudaEventRecord(start_event, 0); cudaEventRecord(stop_event, 0); cudaEventSynchronize(stop_event); cudaEventElapsedTime(&time_kernel, start_event, stop_event); printf("kernel:\t\t% 15.8ef\n", time_kernel);
the code runs and gives time near to 2 ms , but when i increase the array size, its expected to increase the elapsed time, and i feel that time increase by watching the console window to have longer time in printing results. but also the time still near 2 ms.
when i run Nsight, the capture time id more than 2 ms as expected.
can I rely on capture time of Nsight to depressant the elapsed time?
and how i can fix the problem of 2 ms using the code above?