compilation of device function with cuda fortran

Hello,
the compiler is frustrating me with

PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unsupported procedure (em_fw.cuf: 1)
PGF90/x86-64 Linux 15.7-0: compilation aborted
pgf90-Fatal-f902 completed with exit code 1

I don’t know what the problem could be. I followed the directives from the compiler manual and the kernel is compiled out of the box.
I invoke the compiler with

pgf90 -c -v -Mcuda -Minfo=all em_fw.cuf

and I obtain the output

Export PGI=/share/apps/pgi

/share/apps/pgi/linux86-64/15.7/bin/pgf901 em_fw.cuf -opt 1 -nohpf -nostatic -x 19 0x400000 -quad -x 59 4 -x 15 2 -x 49 0x400004 -x 51 0x20 -x 57 0x4c -x 58 0x10000 -x 124 0x1000 -tp haswell -x 57 0xfb0000 -x 58 0x78031040 -x 47 0x08 -x 48 4608 -x 49 0x100 -x 120 0x200 -stdinc /share/apps/pgi/linux86-64/15.7/include-gcc44:/share/apps/pgi/linux86-64/15.7/include:/usr/local/include:/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include:/usr/include -cmdline '+pgf90 em_fw.cuf -c -v -Mcuda -Minfo=all' -def unix -def __unix -def __unix__ -def linux -def __linux -def __linux__ -def __NO_MATH_INLINES -def __x86_64 -def __x86_64__ -def __LONG_MAX__=9223372036854775807L -def '__SIZE_TYPE__=unsigned long int' -def '__PTRDIFF_TYPE__=long int' -def __THROW= -def __extension__= -def __amd_64__amd64__ -def __k8 -def __k8__ -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -idir /share/apps/pgi/linux86-64/2015/cuda/6.5/include -def _CUDA -ccff -freeform -x 137 1 -x 180 0x4000000 -cudaver 6.5 -vect 48 -y 54 1 -def __CUDA_API_VERSION=6050 -x 70 0x40000000 -x 189 0x8000 -y 163 0xc0000000 -x 189 0x10 -x 137 1 -modexport /tmp/pgf90VGFvn6SZp5q9.cmod -modindex /tmp/pgf90NGFv1l0KW0QO.cmdx -output /tmp/pgf90-GFv9Dx35pqg.ilm
  0 inform,   0 warnings,   0 severes, 0 fatal for cuda_em_fw
  0 inform,   0 warnings,   0 severes, 0 fatal for computeref
  0 inform,   0 warnings,   0 severes, 0 fatal for ref
PGF90/x86-64 Linux 15.7-0: compilation successful

/share/apps/pgi/linux86-64/15.7/bin/pgf902 /tmp/pgf90-GFv9Dx35pqg.ilm -fn em_fw.cuf -opt 1 -x 51 0x20 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 117 0x1000 -quad -x 59 4 -tp haswell -x 120 0x1000 -x 124 0x1400 -y 15 2 -x 57 0x3b0000 -x 58 0x48000000 -x 49 0x100 -x 120 0x200 -astype 0 -x 137 1 -x 180 0x4000000 -cudaver 6.5 -x 176 0x100 -cudacap 20 -cudacap 30 -cudacap 35 -cudacap 50 -cudaver 6.5 -x 70 0x40000000 -x 124 1 -x 189 0x8000 -y 163 0xc0000000 -x 189 0x10 -y 189 0x4000000 -x 137 1 -x 180 0x4000000 -x 176 0x100 -cudacap 20 -cudacap 30 -cudacap 35 -cudacap 50 -cudaver 6.5 -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 53239 -x 162 53239 -cmdline '+pgf90 em_fw.cuf -c -v -Mcuda -Minfo=all' -asm /tmp/pgf90FGFvDQ16Kx2u.s
  0 inform,   0 warnings,   0 severes, 0 fatal for cuda_em_fw
  0 inform,   0 warnings,   0 severes, 0 fatal for computeref
PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unsupported procedure (em_fw.cuf: 1)
PGF90/x86-64 Linux 15.7-0: compilation aborted
pgf90-Fatal-f902 completed with exit code 1

Unlinking /tmp/pgf90-GFv9Dx35pqg.ilm
Unlinking /tmp/pgf903GFvLnZT9_VH.stb
Unlinking /tmp/pgf90VGFvn6SZp5q9.cmod
Unlinking /tmp/pgf90NGFv1l0KW0QO.cmdx
Unlinking /tmp/pgf90FGFvDQ16Kx2u.s
Unlinking /tmp/pgf90xGFvf_c90sEB.ll

the function ref is declared as

 attributes(device) real function ref(n,d,c,rho,alf,theta,freq)
    implicit none
    integer, device, intent(in)                      :: n
    real, device, dimension(2:n), intent(in) :: d
    real, device, dimension(n+1), intent(in) :: c ,rho, alf
    real, device, intent(in)                 :: theta, freq
    complex, device, dimension(2:n)          :: zin
    complex, device, dimension(n+1)          :: z
    complex, device, dimension(n+1)          :: th
    complex, device, dimension(2:n)          :: s, phi
    complex, device, dimension(n+1)          :: v, k
    integer, device                                  :: i
[...]
end function ref

and I really don’t do anything fancy within the function (unless vector assignment is fancy and I never realised it).

Thanks in advance for any help!

Hi EricMan,

It’s probably a compiler generated procedure call that creates a temp array when calling routines where an array section is being passed in as one of the arguments. How is “ref” being called?

If you can, please send a reproducible example to PGI Customer Service (trs@pgroup.com) and ask them to forward the example to me. I can then confirm if this is the problem and might be able to offer suggested work arounds.

Thanks,
Mat

Hi mkcolg,

I do call ref from the kernel as

 attributes(global) subroutine computeRef(n,d,c,rho,alf,thetaArray,freqArray,r)
    implicit none
    integer(ib), value, intent(in) :: n
    real(rp), dimension(2:n), intent(in) :: d
    real(rp), dimension(n+1), intent(in) :: c ,rho, alf
    real(rp), dimension(10), intent(in) :: thetaArray
    real(rp), dimension(4) , intent(in) :: freqArray
    real(rp), dimension(4,10) :: r
    
    integer :: i,j

    i = (blockIdx%x-1)*blockDim%x + threadIdx%x    
    j = (blockIdx%y-1)*blockDim%y + threadIdx%y

    if ( ( i <= 4) .and. (j <= 10 ) ) then
       r(i,j) = ref(n,d,c,rho,alf,thetaArray(j),freqArray(i))
       !write(*,*) i, j, thetaArray(j), freqArray(i), r(i,j)
    end if
  end subroutine computeRef

I am sendin an email to the customer services attaching the test program I am not able to compile.

(Note that the codes compiles and works properly if I use -Mcuda=emu)

Thanks,
Eric

Hi Eric,

Thanks for sending in the example. The problem was a known issue where we weren’t handling complex return types properly in CUDA Fortran device functions. The error was fixed in the 16.1 release and I was able to successfully build and run your example.

Thanks!
Mat

% pgf90 em_ref_nlay3.cuf -fast -V15.7
PGF90-F-0000-Internal compiler error. Unhandled return type for function       4 (em_ref_nlay3.cuf: 120)
PGF90/x86-64 Linux 15.7-0: compilation aborted
% pgf90 em_ref_nlay3.cuf -fast -V16.3
% a.out
   0.8678  0.8695  0.6196  0.3704  0.4103  0.2182  0.3084  0.1129  0.2002  0.0822
   0.8587  0.8557  0.6013  0.5529  0.1219  0.2724  0.2374  0.0698  0.3867  0.2231
   0.8688  0.8652  0.3527  0.2064  0.1539  0.0528  0.1628  0.2037  0.2551  0.0981
   0.8729  0.8648  0.3843  0.5624  0.2415  0.3341  0.2845  0.0959  0.2797  0.1342

I’ve been able to compile and run the example, with the same numerical results as the cpu code. Thanks a million.