program nv
parameter(idim=10000)
real a(idim),b(idim)
a(1:idim)=0.0
b(1:idim)=0.0
!$acc region
do k=1,5000
do i=1,idim
a(i)=a(i)+ sin(i1.0)cos(i1.0)
end do
end do
b(1:idim)=a(1:idim)0.71
a(1)=b(idim)
!$acc end region
write(,) a(1),a(idim),b(idim)
end program nv
When I compile it with
pgf90 nv.f90 and run it I get
time ./a.out
1033.070 1455.028 1033.070
real 0m3.451s
user 0m3.403s
sys 0m0.003s
When I try to use the GPU and compile it with
pgf90 -ta=nvidia,cc11 nv.f90
I get
time ./a.out
2273.086 1455.028 1033.070
real 0m0.215s
user 0m0.114s
sys 0m0.097s
Note, that the result is different. It seems that the assignment
a(1)=b(idim)
is not executed.
Is this a correct behaviour? Are simple assignments not allowed in the GPU code?
i
The compiler version is 10.1. And all runs on a Fedora 11 system.
Thanks in advance.
Robert
Currently kernels are only created from loops. So your scalar assignment is being missed. This is a known deficiency which our engineers are working on addressing. To help things along, I sent a report to our engineers (TPR#16548).
To work around this limitation, either perform the scalar assignment on the host or put the scalar assignment in a loop that’s executed only once.
For example:
program nv
parameter(idim=10000)
real a(idim),b(idim)
a(1:idim)=0.0
b(1:idim)=0.0
!$acc region
do k=1,5000
do i=1,idim
a(i)=a(i)+ sin(i*1.0)*cos(i*1.0)
end do
end do
b(1:idim)=a(1:idim)*0.71
!$acc end region
a(1)=b(idim)
write(*,*) a(1),a(idim),b(idim)
end program nv
or
program nv
parameter(idim=10000)
real a(idim),b(idim)
a(1:idim)=0.0
b(1:idim)=0.0
!$acc region
do k=1,5000
do i=1,idim
a(i)=a(i)+ sin(i*1.0)*cos(i*1.0)
end do
end do
b(1:idim)=a(1:idim)*0.71
do k=1,1
a(k)=b(idim)
enddo
!$acc end region
write(*,*) a(1),a(idim),b(idim)
end program nv