CUDA Timing

Hi,

I want to know about timing of kernel execution in loop. For Example, I am using following code. I want to get overall running time of all kernel invocations.

  cudaEventCreate(&start);
  cudaEventCreate(&stop);
  
  cudaEventRecord(start,0); 
  
  // execute the kernel
  kernel<<< grid, threads >>>(.......);
  kernel_1<<< blocksPerGrid, threadsPerBlock>>>(.....);
  cudaMemcpy(A,  B,  mem_size_B, cudaMemcpyDeviceToDevice);


  for(k=2;k<v1;k++)
  {
	// execute the kernel
	kernel<<< grid1, threads >>>(.......);
	cudaMemcpy(dA,  dB,  mem_size_B, cudaMemcpyDeviceToDevice);
  }
      kernel<<< grid2, threads >>>(.........);
      cudaMemcpy(......);

//stop timer
  cudaEventRecord(stop,0);
  cudaEventSynchronize(stop);
  cudaEventElapsedTime(&time,start,stop);
  
  cudaEventDestroy(start);
  cudaEventDestroy(stop);

Regards,
Kashif

Hi mkashifhanif,

Add event record immediately before and after each kernel invocation to get each kernel execution time.

Or you can get kernel execution time via nvvp.

Best regards!

Thanks for your kind reply,

I want to take timing of all kernel execution as whole. You mean take individual timing and add them. But what will be about kernel in loop.

I am running on Linux. I did not get what is nwp?

Best Regards,

Kashif

Hi mkashifhanif,

It is possible a add event recorders in loop.

But you also can try to use profiler to measure the kernel execution time. Both command line profiler or visual profiler will be OK. Please refer to its manual.

Best regards!