I noticed in the changelog of PGI 20.1:
Changed copy behavior for OpenACC reductions to adhere to the OpenACC specification. The compilers will now follow the defined behavior for copy and not copy the reduction variable to the device if it is already present. To enable the compiler’s previous behavior, use an update device/host if_present directive.
I was wondering on some clarification on this.
If I have “sum=0” followed by a reduction loop with OpenACC on it, as far I understood the “sum1” with its 0 value is copied to the device and the resulting sum is stored in “sum” and is available on the CPU.
On the second call, “sum” is again copied to the device with a “0” value.
When I compile my code I see:
Generating implicit copy(sum1) [if not already present]
Does this mean that “sum1” is copied to the device with 0-value the first time, but on the second call it retains its previous final value and is not reset to 0 on the device?
If so, this is a major change and will break all my codes unless update all collectives to manually manage the “sum” scalars and update them manually.
Is this correct?
I had thought that all scalars are treated differently than arrays when it comes to copy, update etc in that they are all implicitly handled to do what the programmer expects (i.e. one does not have to add all scalars to the GPU memory manually, copy them, reset them in a kernel, etc).
Is this no longer true in general or just for collectives?