The result changes, after adding the openacc statement

After parallelizing the following code, the result is different from that before parallelization. Why? How to parallelize the program?

!$acc parallel loop vector
       do k=n_start,n_end
         if (iflag(k).ne.0) then
            dx = xp(k) - xmin
            dz = zp(k) - zmin
            icell = int( dx * one_over_3h ) + 1
            kcell = int( dz * one_over_3h ) + 1
            ii    = icell + (kcell - 1)*ncx
!$acc atomic capture
           nc(ii,kind_p) = nc(ii,kind_p)+1
           ibox(ii,kind_p,nc(ii,kind_p))=k
!$acc end atomic
         endif
      enddo
!$acc end parallel

28, Generating implicit copy(ibox(:,kind_p,:))
Generating implicit copyin(iflag(n_start:n_end),xp(n_start:n_end),zp(n_start:n_end))
Generating implicit copy(nc(:,kind_p))
29, Complex loop carried dependence of nc prevents parallelization
Loop carried dependence due to exposed use of nc(:,kind_p),ibox(:,kind_p,:) prevents parallelization
Accelerator scalar kernel generated
Accelerator kernel generated
Generating Tesla code
29, !$acc loop seq

Hi kingpo,

There’s not enough information here to give you a diffinative answer and wrong answers could be due to various reasons. It’s possible that data isn’t properly being copied between the host and device, or it could be a race condition in your code. If you could post a full reproducing example, that would be helpful.

I do notice that your atomic capture is in an incorrect form. Try capturing the value of the array to a local variable before using it as an index into ibox.

Something like:

!$acc parallel loop vector 
       do k=n_start,n_end 
         if (iflag(k).ne.0) then 
            dx = xp(k) - xmin 
            dz = zp(k) - zmin 
            icell = int( dx * one_over_3h ) + 1 
            kcell = int( dz * one_over_3h ) + 1 
            ii    = icell + (kcell - 1)*ncx 
!$acc atomic capture 
           nc(ii,kind_p) = nc(ii,kind_p)+1 
           idx = nc(ii,kind_p) 
!$acc end atomic 
!$acc atomic write 
           ibox(ii,kind_p,idx)=k 
         endif 
      enddo 
!$acc end parallel

-Mat

Thanks for your answer� It still not right.
I think this is the problem of openacc statement. Once I add the openacc statement, the result is wrong, without the openacc statement, the result is right. My code is part of a large project that contains more than 30 subroutine.F files. Can I upload the entire project code or only upload the subroutine code
As I don’t know how to upload files, I will post some code below.

   subroutine step
      include 'common.2D'

      call ini_divide(2)
      call divide(nbp1,npt,2)

      end



 subroutine ini_divide(kind_p)
      include 'common.2D'

!$acc declare present(nc(:,:),ibox(:,:,:),nct)

!$acc update device(nct)

!$acc kernels copyin(nplink_max)
      do i=1,nct
            nc(i,kind_p)  = 0
            ibox(i,kind_p,1:nplink_max)  = 0
      enddo
!$acc end kernels

!$acc update host(nc(:,:),ibox(:,:,:))

      return

      end



 subroutine divide(n_start,n_end,kind_p)
      include 'common.2D'

!$acc parallel loop vector
       do k=n_start,n_end
         if (iflag(k).ne.0) then
            dx = xp(k) - xmin
            dz = zp(k) - zmin
            icell = int( dx * one_over_3h ) + 1
            kcell = int( dz * one_over_3h ) + 1
            ii    = icell + (kcell - 1)*ncx
!$acc atomic capture
           nc(ii,kind_p) = nc(ii,kind_p)+1
           idx = nc(ii,kind_p)
!$acc end atomic
!$acc atomic write
           ibox(ii,kind_p,idx)=k
         endif
      enddo
!$acc end parallel

       return
       end

Are you updating the host data for “nc” and “ibox” someplace higher in the code?

Once I add the openacc statement, the result is wrong, without the openacc statement, the result is right.

Can you give more detail about the wrong answer? Are you getting zero’s in the results? Garbage values? Slightly incorrect answers?

Zero’s or garbage values are most likely problems with not synchronizing the host and device copies of the data.

Slightly incorrect answers are more likely to be a problem with the compute loops, such as a race condition.

-Mat

Thank you very much for your advice. I’ve found the problem. It’s a data transmission problem. I have fixed it.
But the result is still slightly wrong. Now, the result is correct when using the ‘seq’ statement in the parallel area, but when using the ‘vector’ statement, the result will have a smile error.

Which result is slightly wrong? Ibox?

The order in which Ibox is updated is non-determinsitic so may result in different values when run in parallel.

how can i fixed this problem?

Hi kingpo,

Since I have incomplete information, it’s difficult for me to offer advice here. From the information given, it seems likely that this loop is not parallelizable and you should run it serially. Perhaps there’s an alternative algorithm you can use?

-Mat

The following code is where the ibox and nc data has been used. Is the previous loop running serial only?

     subroutine celij(j1,j2,kind_p1,ini_kind_p2,lx2)
c
      include 'common.2D'  
       
      do kind_p2=ini_kind_p2,2
        if(nc(j2,kind_p2).ne.0) then

        do ii=1,nc(j1,kind_p1)
          i = ibox(j1,kind_p1,ii)
         
          do jj=1,nc(j2,kind_p2)
           j = ibox(j2,kind_p2,jj)
            
            drx = xp(i) - xp(j)
            drz = zp(i) - zp(j)

            call periodicityCorrection(i,j,drx,drz,lx2)

            rr2 = drx*drx + drz*drz

            !if(rr2.lt.fourh2.and.rr2.gt.1.e-18) then
            if(rr2.lt.enineh2.and.rr2.gt.1.e-18) then
             dux = up(i) - up(j)
             duz = wp(i) - wp(j)

c            Calculating kernel & Normalized Kernel Gradient
             call kernel(drx,drz,i,j,j1,j2,rr2) 
             call kernel_correction(i,j)
           
.............................
         enddo
        enddo
	 endif
	enddo

      end

Is the previous loop running serial only?

If you’re using the “parallel” directive, then no, it would be running in parallel. To have it run serially, use the “serial” directive.

I may be of better help if you can send me the full source. Can you send it to PGI Customer Service at support@pgroup.com?

-Mat

Thanks Mat! I have send my codes to support@pgroup.com. Please check it!