Hello,

I am comparing CUDA and OpenACC versions of my code now and tried to profile codes with CUDA VISUAL profiler.

I have tried to make the codes to be as close as possible, but I am still getting different profiling results.

Here the profiling result for the CUDA code:

Could you, please, explain me where do these small data copy calls (thin blue lines before and after each kernel) come from?

my OpenACC code looks like:

```
!$acc data create( hvx, hvy, hvz, grdx, grdy, grdz), &
!$acc copyin (vx,vy, vz, h) , &
!$acc copyout (dh,dvx, dvy, dvz), &
!$acc create (scl, omega)
! first kernel
!$acc kernels loop gang vector(4) create (depth), present (CNST_EGRAV, GRD_zs, ADM_VNONE)
do l=1,ADM_lall
!$acc loop gang vector(128)
do n =1, ADM_gall
scl(n,k,l)=&
-( CNST_EGRAV*(h(n,k,l)) &
+0.5D0*( vx(n,k,l)*vx(n,k,l) &
+vy(n,k,l)*vy(n,k,l) &
+vz(n,k,l)*vz(n,k,l) ) )
depth=h(n,k,l)-GRD_zs(n,k,l,ADM_VNONE)
hvx(n,k,l)=depth*vx(n,k,l)
hvy(n,k,l)=depth*vy(n,k,l)
hvz(n,k,l)=depth*vz(n,k,l)
end do
!$acc end kernels
!$acc update host(scl)
end do
!Other kernels
!$acc end data
```

[/code]

Thank you,

Irina.