debug error when use openacc directives

Hi Mat,

I have written a program with openacc directives,but i cannot get

right answer ,please give some advise about it:

  
!$acc parallel loop private(nb1)
	do k=1,n3  
	  nb1=0 
!$acc do reduction(+:nb1)
	  do eid1=1,nb_dem 
        if(Z(k)<=XYZ(3,eid1).and.XYZ(3,eid1)<=D(k))then
               nb1=nb1+1
	       list_z(nb1,k)=eid1
        else if(XYZ(3,eid1)<=Z(k).and.Z(k)<=XYZ(6,eid1))then
               nb1=nb1+1
	       list_z(nb1,k)=eid1
               mask1(eid1,k)=100
        end if
      end do	
      A1(k)=nb1
    end do
!$acc end parallel

Array ‘A1’ is right ,but array ‘list_z’ is wrong.(PVF13.10)

thank you!

-bigwbxu

Hi bigwbxu,

First, you can’t use “nb1” as one of the index into list_z unless the “eid1” loop is run sequentially. While “nb1” is private to the outer loop, it’s shared in the inner loop and nb1’s value will depend upon the previous loop iteration’s value. Forcing parallelization causes a race condition and why you’re getting wrong answers.

If you can’t not change your algorithm to not use “nb1” as an index, then you’re only choice is to run the inner loop sequentially.

Hope this helps,
Mat

Hi ,

I am aware that the atomic operation in CUDA is able to solve this

problem. I am wondering whether I can get the right answer by using

CUDA fortran? Or is there any thing about atomic operation in openacc

can be used to solve this problem?

-bigwbxu

Does the order in which the “eid1” values are updated into list_z matter? If not, then atomic may work. Atomic just makes sure that the update to nb1 is visible to all threads, but you can’t presume that the atomics are performed in a particular order. You’ll also need to remove nb1’s reduction, make sure the outer loop is a “gang” schedule, and the inner loops is “vector”.

Granted, I haven’t tried this solution, so you may still run into problems.

  • Mat