I am currently working with the pgi compiler 12.6 and I run into some unexpected problems.
While working with 12.5 all the results are correct but if I choose to compile the exact same source code with the pgi 12.6 compiler (no warnings!) the runtime as well as the reults differ (wrong results and totally different runtime).
I realized that the 12.6 compiler chooses to distribute the work in a different way. However, the results should still be correct, right?
Have you encountered similar problems? What could be the reason for this behaviour?
at first you could try to verify GPU code compiler produce
-ta=nvidia,keepgpu
In case compiler really produce incorrect code, fill bug report and send it to PGI team. Hope they suggest you different workarounds for now. Or you can try your own.
Are you going to implement it in the next release?
I think they are looking towards the 12.9/12.10 time frame. We’re trying to get back on schedule with a 12.8 release next week and I don’t think they’ll have this done in time.
Mat, would it be possible to send you the source code via mail?
Yes. Please send it to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me. If it’s a compiler bug, I’ll triage it, submit a report, and hopefully find a work around.
I just wanted to let you know that the exact same source code is working with
PGI compiler 12.9 again. (12.8 was not working as well).
I realized that 12.9 schedules the work as follows:
121, Loop is parallelizable
Accelerator kernel generated
121, #pragma acc loop gang /* blockIdx.x */
CC 2.0 : 27 registers; 32 shared, 136 constant, 0 local memory bytes
131, #pragma acc loop vector(256) /* threadIdx.x */
while 12.8 does the following:
121, #pragma acc loop gang /* blockIdx.x threadIdx.x */
Cached references to size [(x)] block of 'bhat'
CC 2.0 : 27 registers; 32 shared, 136 constant, 0 local memory bytes
131, #pragma acc loop vector(32) /* threadIdx.y */
These two look very similar, but 12.8 reports something about threadIdx.x in line 121. This is kind of strange since the feedback doesn’t say anything about vector in this line (different from 12.9).
What does the /* … */ part stand for anyway?
@Mat: This is the same version I filed a bug-report for earlier.
It’s informational about the correspondence between the OpenACC schedule and the target device schedule. For NVIDIA CUDA, a “gang” corresponds to a “block” and “vector” to “thread”. The “.x”, “.y”, and “.z” are the dimensions.