I’m working on moving a large fortran code on GPU with OpenACC and I run into an algorithmic problem. May be some experimented programmers may have an idea to help me solving it. Simplifying the code, it looks like this:
do i=1, nx
if ( abs(a(index)) < abs( b(i) ) ) then
it looks for a maximum absolute value but index is not uniq in the array position(:). So I need an atomic openAcc directive for the assignment. But atomic requires only an assignment instruction (with or without intrinsic use) and no test (I’ve yet tried to rewrite test and assignment in one line).
With an atomic directive only on the assignment line, I do not understand how to be sure that between the test and the assignment an other thread will not change the a(index) value.
Any idea about rewriting this on a GPU with OpenACC ?
Thanks for your help.
Correct, atomics wouldn’t work here given the operation needs to be communitive. But here the order matters causing a loop dependency.
You can run this loop sequentially on the GPU via a “!$acc serial loop” directive, but I’m not seeing a good way to parallelize it.
I was thinking about this along the week-end. May be, duplicating the a array in a_min and a_max will allow parallelism as it just require an intrinsic in the atomic directive (max() or min()). Then a second loop could be run in parallel to extract the higher value in absolute from a_min and a_max as index will be uniq at this time on a, a_min and a_max.
Mays be it should be faster than a sequential execution for several tens of thousands of values even with this redundancy.
This is the typically way to parallelize min/max loc type operations but also needing to capture the value at this position. Though I wasn’t sure it would work here due to the look-up array, “positions”, but maybe.