It is very strange when I do some parallel computing with acc. The results in CPU and GPU are different in second iterator

Here is the code

```
!$acc data copy(u,v,w,lx,ly,lz,mx,my,mz,c,r1,r2,r3,r4,r5,
!$acc& ub,kx,ky,kz,gm1)
!$acc kernels loop present(u,v,w,lx,ly,lz,mx,my,mz,c,ub,kx,ky,kz,
!$acc& r1,r2,r3,r4,r5,gm1) private(m,vb,wb,q2,t3,t1,t2)
do 1000 m=1,n
vb = u(m)*lx(m)+v(m)*ly(m)+w(m)*lz(m)
wb = u(m)*mx(m)+v(m)*my(m)+w(m)*mz(m)
c
q2 = 0.5e0*(u(m)*u(m)+v(m)*v(m)+w(m)*w(m))
c
t3 = 1.e0/c(m)
t1 = -q2*r1(m)-r5(m)+u(m)*r2(m)+v(m)*r3(m)+w(m)*r4(m)
t1 = gm1*t1*t3*t3
t2 = -ub(m)*r1(m)+kx(m)*r2(m)+ky(m)*r3(m)+kz(m)*r4(m)
t2 = t2*t3
t3 = -vb*r1(m)+lx(m)*r2(m)+ly(m)*r3(m)+lz(m)*r4(m)
c
r3(m) = -wb*r1(m)+mx(m)*r2(m)+my(m)*r3(m)+mz(m)*r4(m)
r1(m) = r1(m)+t1
r2(m) = t3
r4(m) = 0.5e0*(t2-t1)
r5(m) = r4(m)-t2
1000 continue
!$acc end parallel
!$acc end data
```

“gm1” in the code is a common variable.

I hava also tried !$acc parallel loop, !$acc parallel loop seq. They are all the same results with above code.Besides, I compile the code with -r8 option.

The compile information is as follows:

```
with parallel loop
75, Generating copy(r5(:),ub(:),v(:),kz(:),lx(:),ly(:),lz(:),mx(:),my(:),mz(:),r1(:),r2(:),r3(:),r4(:),u(:),c(:),kx(:),ky(:),w(:))
76, Generating present(r5(:),ub(:),w(:),v(:),kz(:),lx(:),ly(:),lz(:),mx(:),my(:),u(:),c(:),kx(:),ky(:),mz(:),r1(:),r2(:),r3(:),r4(:))
78, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
78, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
with parallel loop seq
75, Generating copy(r5(:),ub(:),v(:),kz(:),lx(:),ly(:),lz(:),mx(:),my(:),mz(:),r1(:),r2(:),r3(:),r4(:),u(:),c(:),kx(:),ky(:),w(:))
79, Generating present(r5(:),ub(:),w(:),v(:),kz(:),lx(:),ly(:),lz(:),mx(:),my(:),u(:),c(:),kx(:),ky(:),mz(:),r1(:),r2(:),r3(:),r4(:))
Accelerator kernel generated
Generating Tesla code
81, !$acc loop seq
```

When I add some acc clause with other part of the code like below. The results for CPU and GPU are the same.

```
do 10001 m=1,n
t1 = 1.e0/c(m)
rrho = 1.e0/rho(m)
xm2 = xm2a(m)*t1
xm2ar = 1.0/xm2a(m)
fplus = (eig2(m)-ub(m))*xm2ar
fmins = -(eig3(m)-ub(m))*xm2ar
r11 = r1(m)
r21 = r2(m)
r31 = r3(m)
r41 = r4(m)
r51 = r5(m)
vmag1 = u(m)**2 + v(m)**2 + w(m)**2
r5t = gm1*(0.5*vmag1*r11
. - (u(m)*r21 + v(m)*r31 + w(m)*r41) + r51)
c
c ---- multiplication by inverse of precond. matrix
c
r1(m) = r11 - (1.-xm2)*r5t*t1*t1
r2(m) = rrho*(-u(m)*r11 + r21)
r3(m) = rrho*(-v(m)*r11 + r31)
r4(m) = rrho*(-w(m)*r11 + r41)
r5(m) = xm2*r5t
c
c ---- multiplication by T(inverse)
c
r5t = r5(m)*t1*t1
r1(m) = r1(m)-r5t
c
t2 = rho(m)*r2(m)
t3 = rho(m)*r3(m)
t4 = rho(m)*r4(m)
c
r2(m) = lx(m)*t2+ly(m)*t3+lz(m)*t4
r3(m) = mx(m)*t2+my(m)*t3+mz(m)*t4
r4(m) = 0.5*(t1*(kx(m)*t2+ky(m)*t3+kz(m)*t4)
. + r5t*fplus)
r5(m) = -0.5*(t1*(kx(m)*t2+ky(m)*t3+kz(m)*t4)
. - r5t*fmins)
10001 continue
```