Will the following be an accurate capture of the time spent on the GPU?
CALL SYSTEM_CLOCK(ICOUNTIN,ICOUNT_RATE,ICOUNT_MAX)
!==
!== A BUNCH OF KERNEL LAUNCHES
!==
ISTAT=ISTAT+CUDADEVICESYNCHRONIZE()
CALL SYSTEM_CLOCK(ICOUNTOUT,ICOUNT_RATE,ICOUNT_MAX)
ITIME=ITIME+(ICOUNTOUT-ICOUNTIN)
GPU_TIME_IN_SECONDS=DBLE(ITIME)/DBLE(ICOUNT_RATE)
If it is not then why wouldn’t this work?