I have written a program with openacc directives,but i cannot get
right answer ,please give some advise about it:
!$acc parallel loop private(nb1)
!$acc do reduction(+:nb1)
!$acc end parallel
Array ‘A1’ is right ,but array ‘list_z’ is wrong.(PVF13.10)
First, you can’t use “nb1” as one of the index into list_z unless the “eid1” loop is run sequentially. While “nb1” is private to the outer loop, it’s shared in the inner loop and nb1’s value will depend upon the previous loop iteration’s value. Forcing parallelization causes a race condition and why you’re getting wrong answers.
If you can’t not change your algorithm to not use “nb1” as an index, then you’re only choice is to run the inner loop sequentially.
Hope this helps,
I am aware that the atomic operation in CUDA is able to solve this
problem. I am wondering whether I can get the right answer by using
CUDA fortran? Or is there any thing about atomic operation in openacc
can be used to solve this problem?
Does the order in which the “eid1” values are updated into list_z matter? If not, then atomic may work. Atomic just makes sure that the update to nb1 is visible to all threads, but you can’t presume that the atomics are performed in a particular order. You’ll also need to remove nb1’s reduction, make sure the outer loop is a “gang” schedule, and the inner loops is “vector”.
Granted, I haven’t tried this solution, so you may still run into problems.