Hello,
can you help me is this simple peace of code ?
###########################
!$acc region
do i=1,N
x(i)=x(i)+dx
c1=0 ; c2=0
do j=1,N
if (x(i)<xold(j)) then
c1=c1+1
endif
if (x(i)>xold(j)) then
c2=c2+1
endif
enddo
Fx=dble(c1-c2)/dble(N)
v(i)=v(i)+dv
F(i)=Fx
enddo !!! end i
!$acc end region
###########################
i dont know why, but it is not reproducing the same result
as if it is compiled and executed with no parallel processor.
The compilation output seems to be ok:
###########################3
34, Loop unrolled 16 times
35, Loop unrolled 4 times
40, Loop unrolled 8 times
68, Loop unrolled 16 times
83, Generating present_or_copy(f(1:8192))
Generating present_or_copy(v(1:8192))
Generating present_or_copy(x(1:8192))
Generating present_or_copyin(xold(1:8192))
Generating compute capability 2.0 binary
84, Loop is parallelizable
Accelerator kernel generated
84, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
CC 2.0 : 27 registers; 0 shared, 104 constant, 0 local memory bytes
91, Loop is parallelizable
121, Loop unrolled 16 times
energy:
147, Loop unrolled 4 times
#############################
I think the problem is with the variables inside the loop in " j ". How can i tell the compilator that for each " i " it has their own c1,c2 and Fx ?
Thank you very much