Timing GPU Region

Will the following be an accurate capture of the time spent on the GPU?

      CALL SYSTEM_CLOCK(ICOUNTIN,ICOUNT_RATE,ICOUNT_MAX)

!==
!==   A BUNCH OF KERNEL LAUNCHES
!==

      ISTAT=ISTAT+CUDADEVICESYNCHRONIZE()

      CALL SYSTEM_CLOCK(ICOUNTOUT,ICOUNT_RATE,ICOUNT_MAX)
      ITIME=ITIME+(ICOUNTOUT-ICOUNTIN)
      GPU_TIME_IN_SECONDS=DBLE(ITIME)/DBLE(ICOUNT_RATE)

If it is not then why wouldn’t this work?

Hi Sarom,

This code would give you the total time the host spends between the two system_clock calls. This will include both host and gpu execution time as well any data movement time. If you want just the GPU kernel times, then you need to use CUDA Events or profile the code.

  • Mat