Problem with some device level intrinsics in pgi/15.1.

Some of the device level intrinsics in 15.1 seem messed up. The following two lines of code compile with 14.4:

      if( maxval(err) > 0.0_gpu )then

          f  = 0.9_gpu/maxval( (err/err0_rkbs23(1:3))**0.5_gpu )

This is in a device subroutine. err0_rkbs(3) is type const, err(3) and f are local to the subroutine. gpu is just set to double.

If I try to compile with 15.1, I get the following:

PGF90-S-0155-Calls from device code to a host function are allowed only in emulation mode - __pgi_mxv2hr8_1d (./ctrk_mod.f90: 7909)
  0 inform,   0 warnings,   1 severes, 0 fatal for rkbs23_

However, If I change my code to this:

      if( max(err(1),err(2),err(3)) > 0.0_gpu )then

          f  = 0.9_gpu/maxval( (err/err0_rkbs23(1:3))**0.5_gpu )

the code does compile with pgi/15.1. What I don’t understand is why the first maxval function call is causing a problem, but the second is fine.

I have similar issues with the sum and minval intrinsics.

Rob.

Hi Rob,

maxval should get inlined into your kernel so I’m not sure why the compiler is trying to call the maxval runtime routine. I tried to recreate the issue here, but was unable. Can you send a reproducing example to PGI Customer Service (trs@pgroup.com) so we can understand what’s going wrong?

Thanks,
Mat

Hi Mat,

I got the same problem with 16.10 Community edition on linux.
Below is he simple code. What happened…?
Thank you in advance.

CY

[root@GPUServer GPU_MOS]# cat test.cuf
module GPU_module
use cudafor
implicit none
    integer, parameter :: gNumX = 10, gLimitedRec = 256, gMaxNvvp = 40
    contains
    attributes(global) subroutine LinearRegression_gpu_all()
    implicit none
    real*8, device :: temp, LLXY_AdR2(gMaxNvvp)
        temp=MAXVAL(LLXY_AdR2(:))
    end subroutine LinearRegression_gpu_all
end module GPU_module
[root@GPUServer GPU_MOS]# /opt/pgi/linux86-64/16.10/bin/pgf90 -g -Mcuda=ptxinfo -c test.cuf
PGF90-S-0155-Calls from device code to a host function are allowed only in emulation mode - __pgi_mxv2hr8_1d (test.cuf: 9)
  0 inform,   0 warnings,   1 severes, 0 fatal for linearregression_gpu_all
[root@GPUServer GPU_MOS]#

Hi CY,

Apologies, I should have posted a follow-up to this post.

What’s happening is that in the cudafor module, we use generic interfaces to allow for you to put device arrays in the maxval, minval, and sum intrinsics when calling these intrinsics from host code. However, we don’t have a way to discern if the call is being made from device code so you’re getting the host interface in your kernel.

The simple solution is to not use the cudafor module. Then the normal maxval is used and is inlined into the kernel. If you do use other CUDA Fortran API calls so do need to use the module, then you can rename maxval to something else. Of course, you won’t be able to call maxval with a device array from the host any longer, but will from device code.

For example:

 % cat test.cuf
module GPU_module
#ifdef USE_CUDAFOR
 use cudafor, mymaxval=>maxval
#endif
 implicit none
     integer, parameter :: gNumX = 10, gLimitedRec = 256, gMaxNvvp = 40
     contains
     attributes(global) subroutine LinearRegression_gpu_all()
     implicit none
     real*4, device :: temp, LLXY_AdR2(gMaxNvvp)
         LLXY_AdR2=1.0
         temp=MAXVAL(LLXY_AdR2)
     end subroutine LinearRegression_gpu_all
 end module GPU_module
% pgf90 -c -Mpreprocess test.cuf
% pgf90 -c -Mpreprocess -DUSE_CUDAFOR test.cuf

Hope this helps,
Mat

Hi Mat,

Appreciate the suggestion. It does help! Thank you very much.

CY