Hi Mat and thank you
I did a couple of experiments last week.
1)First, we solve the NaN problem by putting the IF statement out of our parallel region.
!$acc end parallel
Why this was happenning? because of my “FLAG”? Every thread has its own copy of FLAG variable?
2)I have 2 adjacent loops inside this parallel region and in the first loop i use the reduction clause and i need the results in the second loop.
msys = 0
tmpx = 0
tmpy = 0
tmpz = 0
!$acc loop vector reduction(+:msys,tmpx,tmpy,tmpz)
msys = msys + m(i)
tmpx = tempx +vx(i)
!$acc loop gang vector
vxb(i) = vx(i) + tmpx/msys
vyb(i) = vy(i) + tmpy/msys
First of all, why do i need VECTOR clause in the first loop? (i get wrong results with GANG VECTOR)Because of the reduction?
Second,is it possible to take different results in two different execution of my program? Because in your article you say that there is a barrier at the end of the parallel region, not at the end of the first loop. So i believe that a random thread has not the correct values for the calculations in the second loop(for example: not correct value of the msys variable).
3)When i change my parallel region into kernel region i have another problem.From Nvidia Visual profiler i can see that there is a communication between host and device when my program reaches that region and i can’t figure out the reason(is it because of the reduction?).I have a copy from host to device and then after the loop back to the host.With the parallel construct i don’t see that communication and i have better time results. why is that happening?
Thank you for your help,