Dear Nvidia users, I’m trying to overlap computation and memory transfer of the following code:
subroutine add2s2_omp(a,b,c1,n) real a(n),b(n) real,value:: c1 integer,value:: n !$OMP TARGET TEAMS LOOP do i=1,n a(i)=a(i)+c1*b(i) enddo return
…
!$OMP TARGET DATA MAP(to:xx) MAP(from:b,bbar) !$OMP TARGET UPDATE TO(bb) depend(out:xbar) nowait do k = 2,m call add2s2_omp(xbar,xx(:,k),alpha(k),n) end do !$OMP TARGET UPDATE FROM(xbar) nowait do k = 2,m call add2s2_omp(bbar,bb(:,k),alpha(k),n) end do !$OMP TARGET UPDATE FROM(xbar,xx) depend(in:xbar) nowait do k = 2,m call add2s2_omp(b,bb(:,k),-alpha(k),n) end do
!$OMP END TARGET DATA
print *, xbar(1),bbar(1),b(1)
The problem is that the results are bad. This should be the correct result:
7.399999 5.980000 -0.4800002
Insted I have:
7.399999 5.980000 -3.780000
So results on “b” are bad. I don’t understand the reason. Maybe I have to enable pinned memory? How I can enable pinned memory using OpenMP offload? Attached the code example. Thanks.
add2s2_omp.f (2.1 KB)