Will the following be an accurate capture of the time spent on the GPU?
      CALL SYSTEM_CLOCK(ICOUNTIN,ICOUNT_RATE,ICOUNT_MAX)
!==
!==   A BUNCH OF KERNEL LAUNCHES
!==
      ISTAT=ISTAT+CUDADEVICESYNCHRONIZE()
      CALL SYSTEM_CLOCK(ICOUNTOUT,ICOUNT_RATE,ICOUNT_MAX)
      ITIME=ITIME+(ICOUNTOUT-ICOUNTIN)
      GPU_TIME_IN_SECONDS=DBLE(ITIME)/DBLE(ICOUNT_RATE)
If it is not then why wouldn’t this work?