syntax error ?

Hello,
can you help me is this simple peace of code ?

###########################

!$acc region
do i=1,N

x(i)=x(i)+dx

c1=0 ; c2=0
do j=1,N
if (x(i)<xold(j)) then
c1=c1+1
endif

if (x(i)>xold(j)) then
c2=c2+1
endif
enddo
Fx=dble(c1-c2)/dble(N)

v(i)=v(i)+dv

F(i)=Fx

enddo !!! end i
!$acc end region

###########################

i dont know why, but it is not reproducing the same result
as if it is compiled and executed with no parallel processor.

The compilation output seems to be ok:

###########################3

34, Loop unrolled 16 times
35, Loop unrolled 4 times
40, Loop unrolled 8 times
68, Loop unrolled 16 times
83, Generating present_or_copy(f(1:8192))
Generating present_or_copy(v(1:8192))
Generating present_or_copy(x(1:8192))
Generating present_or_copyin(xold(1:8192))
Generating compute capability 2.0 binary
84, Loop is parallelizable
Accelerator kernel generated
84, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
CC 2.0 : 27 registers; 0 shared, 104 constant, 0 local memory bytes
91, Loop is parallelizable
121, Loop unrolled 16 times
energy:
147, Loop unrolled 4 times

#############################

I think the problem is with the variables inside the loop in " j ". How can i tell the compilator that for each " i " it has their own c1,c2 and Fx ?

Thank you very much

Hi alechand,

I think the problem is with the variables inside the loop in " j ". How can i tell the compilator that for each " i " it has their own c1,c2 and Fx ?

Scalar variables such as c1, c2, and Fx are privatized by default, so each thread does have it’s own copy. I doubt this is the problem.

i dont know why, but it is not reproducing the same result
as if it is compiled and executed with no parallel processor.

Can you post or send to PGI Customer Service (trs@pgroup.com) a reproducing example? I can’t tell what’s wrong from this snip-it.

  • Mat

Hi,
i found the problem. In the beggining of the code i have to obtain some random numbers. I was comparing the results of using the pgfortran and gfortran. But each compilator get different random numbers with the standard seed…

Now, using for comparison always the pgfortran, it is working.

Thanks

Hi Mat,
Coming back to the same previous problem. The results are not the same (comparing with a host execution) if i increase the TIME variable in my code:

#######################

!$acc data

do t=1,TIME

!$acc region
do i=1,N

x(i)=x(i)+DT1v(i)+DT2F(i)

!!! calculate forces
!Fx=0.
c1=0 ; c2=0
do j=1,N
if (x(i)<xold(j)) then
!Fx=Fx+1./dble(N)
c1=c1+1
endif

if (x(i)>xold(j)) then
!Fx=Fx-1./dble(N)
c2=c2+1
endif
enddo
Fx=dble(c1-c2)/dble(N)
!!! calculate forces_end

v(i)=v(i)+DT1/2.*( F(i) + Fx )

F(i)=Fx

enddo !!! end i
!$acc end region

xold=x

enddo !!! end t

!$acc end data

############################

for example, if i use N=500 and TIME=100000,
the results are different, but i i use TIME=10000, they are the same.

It seems the compiler is not capable to calculate things when i increase TIME…

Can you help ? thanks

Hi alechand,

My best guess would be it’s because you’re overflowing the data range of x, but there’s not enough info here to be sure. Please either post or send me a reproducing test case and I’ll see I can determine the problem.

  • Mat

Thanks Mat,
i sent you an email.

Hi alechand,

It appears to be the difference is being caused by accumulated rounding error when using fused-multiply-add (FMA) instructions. The same issue can be seen on the CPU when using higher optimizations. Try adding the flag “-ta=nvidia,nofma” to see if this helps.

  • Mat

Mat,
unfortunately, this did not help.

I was thinking, if the results agree “exactly” using TIME=100000,
how can be accumulated errors ?

Do you have other idea?
I really appreciate your attention.

We were able to trace this down. It looks like that at least a few x(i) and xold(j) values are within a small margin of error difference. With slight changes in precision in these cases, the dominant value may flip-flop leading to divergent values of c1 and c2. This then has a cascading effect which leads to the eventual wrong answer.

  • Mat