Hello, developers：

There is a problem of computational accuracy In my cuda fortran program(PGI 19.10 compiled). The calculation formula is below:

attributes(global) subroutine diff_u(he,h,u,qx,sx,sy,manning,eps,nx,ny,mbc_1,mbc_2,mbc_3,mbc_4)

integer ,value::nx,ny,mbc_1,mbc_2,mbc_3,mbc_4

real*8 :: he(1:nx+mbc_1+mbc_2,1:ny+mbc_3+mbc_4)
real*8 :: h(1:nx+mbc_1+mbc_2,1:ny+mbc_3+mbc_4)

real

*8 :: u(1:nx+mbc_1+mbc_2,1:ny+mbc_3+mbc_4)*

real8 :: qx(1:nx+mbc_1+mbc_2,1:ny+mbc_3+mbc_4)

real

real

*8 :: sx(1:nx+mbc_1+mbc_2,1:ny+mbc_3+mbc_4)*

real8 :: sy(1:nx+mbc_1+mbc_2,1:ny+mbc_3+mbc_4)

real

real

*8 :: manning(1:nx+mbc_1+mbc_2,1:ny+mbc_3+mbc_4)*

real8,value :: eps

real

integer :: i,j

i = (blockIdx%x-1)* blockDim%x + threadIdx%x

j = (blockIdx%y-1)* blockDim%y + threadIdx%y

u(i,j) = (-(sx(i,j)+eps)/(abs(sx(i,j)+eps))) *(1/manning(i,j))

*he(i,j)**(2./3.)*&

dsqrt(abs(sx(i,j)))/(1+(ABS(sy(i,j))/(ABS(sx(i,j))+eps))**2)**0.25

end subroutine diff_u

The results of u calculated on gpu and CPU are different for some points,such as:

If you have encountered a similar problem, please share the solution,thank you.