PGF90-W-0155-Compiler failed ... with PGI 12.4

Hi,

I have code with acc regions (PGI proprietary directives), which is compiling and running fine with PGI 12.3.

If I try to compile it with PGI 12.4 I get the following message:

compiling organize_newphysics.f90
/tmp/pgaccVLJdnhy-8syX.gpu(221): error: expression must have arithmetic or enum type

/tmp/pgaccVLJdnhy-8syX.gpu(223): error: expression must have arithmetic or enum type

/tmp/pgaccVLJdnhy-8syX.gpu(257): error: expression must have arithmetic or enum type

3 errors detected in the compilation of "/tmp/pgnvdNqLd1iOkesQS.nv0".
PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (/project/s83/lapixa/COSMO_ICON_4.18_GPU/bin_gpu_castor/src/organize_newphysics.f90: 609)
organize_newphysics:
...

the correspongin -Minfo message for this kernel which starts line 611 is:

...
    597, Accelerator restriction: induction variable live-out from loop: .dY0013
    611, Accelerator restriction: scalar variable live-out from loop: .dY0016
         Accelerator kernel generated
        611, !$acc do parallel, vector(256) ! blockidx%x threadidx%x
    613, Parallelization would require privatization of array 'aersea_b(i1+1)'
         Parallelization would require privatization of array 'aerdes_b(i1+1)'
         Parallelization would require privatization of array 'aerurb_b(i1+1)'
         Parallelization would require privatization of array 'aerlan_b(i1+1)'
         Parallelization would require privatization of array 'rlon_b(i1+1)'
         Parallelization would require privatization of array 'rlat_b(i1+1)'
         Parallelization would require privatization of array 'hmo3_b(i1+1)'
         Parallelization would require privatization of array 'vio3_b(i1+1)'
    642, Accelerator restriction: induction variable live-out from loop: .dY0016
    643, Accelerator restriction: induction variable live-out from loop: .dY0015
...

It seems that a kernel is generated, but the code then gives wrong results.
Does the message " expression must have arithmetic or enum type" refers to some known issues.

For comparison when I compile with pgi 12.3 the -Minfo message is:

...
    597, Accelerator restriction: induction variable live-out from loop: .dY0013
    609, Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
    611, Accelerator restriction: scalar variable live-out from loop: .dY0016
         Accelerator kernel generated
        611, !$acc do parallel, vector(256) ! blockidx%x threadidx%x
             CC 1.3 : 12 registers; 20 shared, 1060 constant, 0 local memory bytes; 100% occupancy
             CC 2.0 : 16 registers; 4 shared, 1088 constant, 0 local memory bytes; 100% occupancy
    613, Parallelization would require privatization of array 'aersea_b(i1+1)'
         Parallelization would require privatization of array 'aerdes_b(i1+1)'
         Parallelization would require privatization of array 'aerurb_b(i1+1)'
         Parallelization would require privatization of array 'aerlan_b(i1+1)'
         Parallelization would require privatization of array 'rlon_b(i1+1)'
         Parallelization would require privatization of array 'rlat_b(i1+1)'
         Parallelization would require privatization of array 'hmo3_b(i1+1)'
         Parallelization would require privatization of array 'vio3_b(i1+1)'
    642, Accelerator restriction: induction variable live-out from loop: .dY0016
    643, Accelerator restriction: induction variable live-out from loop: .dY0015
...

Note also the message
“Accelerator restriction: induction variable live-out from loop: .dY0016 …”
I don’t know what is the meaning of it (and if it may cause issues) but I reported this problem in an earlier post this year. I however didn’t get any solution so far.

I am unfortunately not able to reproduce the problem in a small test example but I can send you the full application.

Best regards,

Xavier

Hi Xavier,

I am unfortunately not able to reproduce the problem in a small test example but I can send you the full application.

I don’t see any reported issues that match yours so unfortunately, we’ll need you send us the full source. Unless it’s the same COSMO source you sent me in February?

  • Mat

Hi Mat,

Yes it is cosmo, but there are some changes. I just send the new version to trs.

Let me know if you have trouble to compile or run the code.

Best regards,

Xavier

Hi Xavier,

I was able to replicate the problem here and have sent a report off to engineering (TPR#18694). I have the problem fairly well isolated so hopefully it’s something they can fix easily.

  • Mat

Hi Mat,

In the mean time I also tried to compile the code with -Mcuda as I need to use some cuda function inside the code.

I am getting several errors:

compiling data_flake.f90
PGF90-F-0000-Internal compiler error. cf_data_init: incorrect offset       0 (/project/s83/lapixa/GPU/PGI/COSMO_ICON_4.18_GPU_r2042/src/data_flake.f90: 279)
gmake: *** [/project/s83/lapixa/GPU/PGI/COSMO_ICON_4.18_GPU_r2042/obj/data_flake.o] Error 2

The strange thing is that this file don’t have any acc statment.

If I then only compile this file without -Mcuda and the rest with. I can compile up to an other file (with acc command), where I get the following error:

/tmp/pgf90-YXc9w7PTIdS.s: Assembler messages:
/tmp/pgf90-YXc9w7PTIdS.s:75211: Error: symbol `.STATICS2' is already defined
/tmp/pgf90-YXc9w7PTIdS.s:81753: Error: symbol `.BSS2' is already defined
/tmp/pgf90-YXc9w7PTIdS.s:97032: Error: symbol `.STATICS2' is already defined
/tmp/pgf90-YXc9w7PTIdS.s:103453: Error: symbol `.BSS2' is already defined
gmake: *** [/project/s83/lapixa/COSMO_ICON_4.18_GPU_dev/obj/organize_newphysics.o] Error 2

Unless any of this message trigers an obvious none issue, I have just send to trs a correction to the makefile to see these problem.

Could you also have a look to these error.

PGI version: 12.3

Hi Xavier,

This looks to be the same issue as TPR##18639 which produces the same ICE message when compiling certain code with “-Mcuda”. As is the case with yours, the code does not contain any CUDA Fortran constructs.

This issue has been fixed with the fix being available in the 12.5 release.

Best Regards,
Mat

Hi,

Are there any news concerning the “PGF90-W-0155-Compiler failed to translate accelerator region” I was seeing in my code (TPR#18694).

I have tried with 12.5 and I now get a different message, and still can’t compile :

PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo message
s): Load of NULL symbol (../src/organize_newphysics.f90: 551)
PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo message
s): Load of NULL symbol (../src/organize_newphysics.f90: 578)

If you have any suggestion for a workaround, or a way to identify the problem I would be interested.

Note, as already mentioned, the code works with 12.3, although I am always getting strange messages about live-out variables, which I would like to understand e.g.:

    574, Accelerator restriction: induction variable live-out from loop: .dY0008
    575, Accelerator restriction: induction variable live-out from loop: .dY0007
    578, Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
    580, Accelerator restriction: scalar variable live-out from loop: .dY0010
         Accelerator kernel generated
        580, !$acc do parallel, vector(256) ! blockidx%x threadidx%x
             CC 1.3 : 24 registers; 184 shared, 12 constant, 0 local memory bytes; 50% occupancy
             CC 2.0 : 20 registers; 4 shared, 196 constant, 0 local memory bytes; 100% occupancy

Best regards,

Xavier

Hi Xavier,

Yes. According to the notes in the TPR report, the error seemed to get worse in 12.5 but is fixed in the 12.6 pre-release compilers. I double checked and do show that the original error no longer occurs. Though, the new assembler errors with -Mcuda do still occur in the pre-release. I’ll report this and hopefully we can get them fixed before 12.6 is released.

Thanks,
Mat

Great news, thanks a lot.

Also could you tell me if the live-out variable messages have disapear in the pre-release. I think this may be related to other issues. Indeed in order to be able to compile and run the code with previous PGI release (12.3 and below), I had to trick the compiler, by adding some dummy loops. Here is an example (in organize_newphysics.f90):



          !$acc region do kernel, parallel & 
          !$acc& vector(256), private(ip,k,i,j), independent  
          DO ip = 1, ipend
             !XL: PGI 12.1 bug?: This dummy loop forces the compiler to execute the ip loop on accelerator
             DO k=1,1 !dummy  k loop
            i = mind_ilon(ip,ib)
            j = mind_jlat(ip,ib)
            skyview_b(ip)= skyview(i,j)
            slo_asp_b(ip)= slo_asp(i,j)
            slo_ang_b(ip)= slo_ang(i,j)
            ENDDO !end dummy k loop
          ENDDO
          !$acc end region
       ENDIF

If I remove the k loop going from 1 to 1, the compiler fails to generate an accelerated region.
This is fine, but I don’t know if I am hidding some more serious issues with this workaround.

Also since I send you the code, I have noticed a some errors (from my side) with respect to some array update. They don’t affect any of the messages nor the compiler error with 12.4 and 12.5, but to be sure I’ll send you a corrected version of the code via trs.

Xavier

Hi Xavier,

Also could you tell me if the live-out variable messages have disapear in the pre-release.

Yes, these have been fixed as well. The problem was a temporary variable to hold a loop trip count was not being marked as private. Since the loop was in a contained subprogram, this made the variable visible to the outer subprogram and thus the ‘live-out’ message.

Indeed in order to be able to compile and run the code with previous PGI release (12.3 and below), I had to trick the compiler, by adding some dummy loops. Here is an example (in organize_newphysics.f90):

One of your colleagues reported a similar issue in which serial sections of device code weren’t being offload correctly to the GPU. This will also be fixed in 12.6.

They don’t affect any of the messages nor the compiler error with 12.4 and 12.5, but to be sure I’ll send you a corrected version of the code via trs.

Please do. Tiziano’s issue may or may not be related to yours so if there is still a problem, we’ed like to know.

Thanks,
Mat

Hi Xavier,

I confirmed that 12.6 does fix this issue as well. However, I hit a different issue in another file where were hitting the 256 byte CUDA kernel argument size limit. I’ve submitted another report (TPR#18752) an put it at the highest priority. My hope being it will get fixed before 12.6 is out and you’ll never see it.

  • Mat

Hi Mat,

I’ve Just tested this code with 12.6 and I am actually hitting the

Kernel argument list is > 256 bytes, the max supported by CUDA

problem, for several kernels.

Xavier

Yes, they weren’t able to get this fixed in time for 12.6. It’s currently scheduled for this month’s 12.8 release due out in a few weeks.

  • Mat

Hi Mat,

I can’t see reference to TPR#18752 on the release page for 12.8, I assume it is not yet solved, or should I try to download 12.8 ?

Xavier

Hi Xavier,

Sorry about this but in our issue tracking system it looks like they’ve moved the target release of this one to 12.9. I’ll send a note to Michael and see where his team is at on it.

  • Mat

Hi Xavier,

Michael’s back and let me know that the fix for TPR#18752 should be available in 12.9. The fix is being tested now within our internal develop compiler, and if all is good, will be moved into the pre-release compiler.

  • Mat

Hi Mat,

Thanks for the feedback.

I have now also fully translated the code from PGIacc (pgi proprietary directives) to OpenAcc and I am also seeing there the Kernel argument list size message when compiling with pgi 12.6.

Could you confirm me that the solution for the PGIacc code would also be working when compiling OpenAcc code ? (it seems that you sometimes treat things differently whether it is PGIacc or OpenAcc).

I can send you an example code.

Cheers,

Xavier

I can send you an example code.

Please do. Though since 12.6, PGI Acc and OpenACC kernels construct, should produce the same generated code. But it would be good for me to double check with your example.

  • Mat