captyre time of Nsight

Dear all;

I tried to measure the elapsed time of my kernel that sort an array using

float time_kernel;
        cudaEvent_t start_event, stop_event;
	cudaEventCreate(&start_event);
	cudaEventCreate(&stop_event);

        mykernel<<<1,1>>>();


	cudaEventRecord(start_event, 0);
	cudaEventRecord(stop_event, 0);
	cudaEventSynchronize(stop_event);
	cudaEventElapsedTime(&time_kernel, start_event, stop_event);
	printf("kernel:\t\t% 15.8ef\n", time_kernel);

the code runs and gives time near to 2 ms , but when i increase the array size, its expected to increase the elapsed time, and i feel that time increase by watching the console window to have longer time in printing results. but also the time still near 2 ms.

when i run Nsight, the capture time id more than 2 ms as expected.

can I rely on capture time of Nsight to depressant the elapsed time?
and how i can fix the problem of 2 ms using the code above?

Manalo,

The method you are using to time the kernel above does not make sense. You have to call cudaEventRecord(start_event, 0) before you call mykernel<<<1,1>>>(). As it is coded I would expected the minimum resolution to be returned which would be 0 or 33ns on cards if you are running the profiler or 0 or 1µs on late kepler and early maxwell cards if you are not running the profiler.

The information you have provided is not sufficient for someone to provide additional feedback. It is recommended that you provide a simple reproducible if you want help debugging the problem.