Dear NVIDIA expert,
Do cudaEventRecord time the cpu sentences? such as
istat = cudaEventRecord(startEvent ,0) do i = 1, nStreams offset = (i -1)* streamSize istat = cudaMemcpyAsync( & a_d(offset +1),a(offset +1), streamSize ,stream(i)) call kernel <<<streamSize/blockSize , blockSize , & 0, stream(i)>>>(a_d ,offset) istat = cudaMemcpyAsync( & a(offset +1),a_d(offset +1), streamSize ,stream(i)) enddo istat = cudaEventRecord(stopEvent , 0) istat = cudaEventSynchronize(stopEvent) istat = cudaEventElapsedTime(time , startEvent , stopEvent)
Is the time used by offset = (i -1)* streamSize included in the total time?
Thank you very much!