illegal opcode error

Hi Mat,

I have a big outer loop (containing many inner loops) that I wish to parallelise. All the data in each iteration of this outer loop is independent from one another. For now, I’m happy for all the inner loops to run in serial (what is the best way to implement this?)

I have used the “independent” in order to try and get this working and privatised a number of variables that were giving me trouble.

When I compile, I now get this error:

PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): illegal opcode (tblock-07.5.f90: 11914)
set_flux_gpu:

After this message, the code appears to generate a kernel, but it all runs on the CPU. Any pointers as to where I’m going wrong would be good.


 11940, Loop is parallelizable
         Accelerator kernel generated
      11940, !$acc loop gang ! blockidx%x
      11964, !$acc loop vector(128) ! threadidx%x
      12042, !$acc loop vector(128) ! threadidx%x
      ...
      14734, !$acc loop vector(128) ! threadidx%x
      14746, !$acc loop vector(128) ! threadidx%x

After this I get a load of errors telling me that various dependencies are prevent parallelization regarding the inner loops which I’m ignoring for now.

Chris

Sorry to tag more problems onto the same post. I’m getting the following errors

  11939, Accelerator restriction: scalar variable live-out from loop: cfwall
         Accelerator restriction: scalar variable live-out from loop: vislam

I privatised these values in order to circumvent the problem to get the error mentioned in the previous post, but I don’t really have a good reason for doing so.

Both of these scalars are read in at the start of the program (before the GPU loop) and are then only used within the loop (albeit in an inlined function call). They are not used after the GPU loop.

Is this an issue with the inlining, and if so, is there a work round?

Thanks,

Chris

Hi Chris,

illegal opcode

Unfortunately this is a generic internal compiler error meaning that it’s detected that it’s generated bad code. I’ve seen this in a few codes, but the reasons have been different for both. I’d need you to send in your updated code which reproduces the problem, so I can pass it on to engineering for investigation.

Is this an issue with the inlining, and if so, is there a work round?

Scalar variables passed to routines (even inlined routines) have the possibility of causing side-effects which can’t be detected upon compilation. Hence, the “live-out” error. I typically recommend not privatizing scalars for performance reasons, but this is one case where you need to.

  • Mat

Hi Mat,

I’ve just fired off an email to TRS, hopefully it will yield something.

Is there a good way of parallelising the outer loop without worrying about the internal loops if all the iterations of the outerloop are independent?

Chris

Hi Chris,

I tried your code against our development compiler and the illegal opcode error goes away. I added TPR#19296 to track your failure and request if the fix for your code can get into the 13.5 release.

Also, I was able to track down where the illegal opcode is coming from in 13.4. It appears to be a problem generating the auto-reduction code for the “DAVGALL” and “DAVG_UNST” sum reduction variables. I’m able to work around the error by adding a explicit reduction clause on the kernel loop directive. (See below)

Is there a good way of parallelising the outer loop without worrying about the internal loops if all the iterations of the outerloop are independent?

Add “gang, vector” to your “kernel loop”. The compiler will still spit out all the dependency analysis Minfo messages for the inner loops, but they will be become extraneous.

!$acc kernels loop gang vector independent reduction(+:DAVGALL,DAVG_UNST), &

Do you have data files and expected output that I can use to run and verify the code?

% tblock-07.5_dev
PGFIO-F-217/formatted read/unit=5/attempt to read past end of file.
 File name = turbine.dat    formatted, sequential access   record = 1

Would this code be available for other purposes once everything is working? Given that this is a ~3000 line kernel, it makes for a nice test for our internal QA. Plus, I’ll looking for codes I can use in an OpenACC benchmarking effort I’m doing with several other companies through SPEC (spec.org). I’m not sure if the code would make a good benchmark, but I wanted to ask before investigating.

Thanks,
Mat