I am working on an asynchronous execution using a subroutine similar to this:
subroutine sub1(stream) $acc data create () copy() async(stream) $acc do loop async(stream) ... $acc end do loop $acc data end end subroutine sub1
The subroutine is called several times by a code like this:
do i =1 , num_chunks streamid= mod(i,2) +1 ! create the ids of two streams: 1,2 call sub1(streamid) end do
The idea is to create two (or more) streams to have a pipelined execution. My doubt is if the acc data end clause performs implicitly a cudaEventSynchonize call that will not enable the concurrent execution.
I hope you can clarify this.