I have a big outer loop (containing many inner loops) that I wish to parallelise. All the data in each iteration of this outer loop is independent from one another. For now, I’m happy for all the inner loops to run in serial (what is the best way to implement this?)
I have used the “independent” in order to try and get this working and privatised a number of variables that were giving me trouble.
When I compile, I now get this error:
PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): illegal opcode (tblock-07.5.f90: 11914) set_flux_gpu:
After this message, the code appears to generate a kernel, but it all runs on the CPU. Any pointers as to where I’m going wrong would be good.
11940, Loop is parallelizable Accelerator kernel generated 11940, !$acc loop gang ! blockidx%x 11964, !$acc loop vector(128) ! threadidx%x 12042, !$acc loop vector(128) ! threadidx%x ... 14734, !$acc loop vector(128) ! threadidx%x 14746, !$acc loop vector(128) ! threadidx%x
After this I get a load of errors telling me that various dependencies are prevent parallelization regarding the inner loops which I’m ignoring for now.