Problems with !$cuf directive

Hi!

I’am trying to use $cuf directive in my program. I have no problems during compilation, but i’ve got LNK2019 error in linking. I thought that i’ve done something wrong, but then i tried to build the cufkernel.cuf example with supplied makefile i got exactly the same error:

pgfortran -fast -o cufkernel1 cufkernel.cuf
pgfortran8gsa82JO0-a1A.obj : error LNK2019: unresolved external symbol _M1_SUB1@32 referenced in function _M1_SUB2@24
cufkernel1.exe : fatal error LNK1120: 1 unresolved externals

I wrote simple example reproducing this error:

module test
implicit none
real*8,device,allocatable :: arr(:,:,:)
integer,constant :: Nx,Ny,Nz
contains

subroutine cuf_test
integer :: i,j,k
!$cuf kernel do(3)<<<*,512>>>
do k=1,Nz
do j=1,Ny
do i=1,Nx
arr(i,j,k)=arr(i,j,k)**2
end do
end do
end do

end subroutine cuf_test

end module

program test_prg
use cudafor, only: cudaThreadSynchronize
use test
implicit none
integer ierr

allocate(arr(32,32,32))
Nx=32; Ny=32; Nz=32
arr=5.

call cuf_test
ierr = cudaThreadSynchronize()
end program

test.obj : error LNK2019: unresolved external symbol _TEST_CUF_TEST@44 referenced in function _TEST_CUF_TEST@0
test.exe : fatal error LNK1120: 1 unresolved externals

without constant attribute for Nx,Ny,Nz the message is slightly different:

test.obj : error LNK2019: unresolved external symbol _TEST_CUF_TEST@12 referenced in function _TEST_CUF_TEST@0
test.exe : fatal error LNK1120: 1 unresolved externals

and if I,j and k have device attribute i get the following error:

PGF90-S-0155-Non-tightly nested loop at nest 1 (test.f90: 10)
PGF90-S-0155-Kernel region ignored; no parallel loops (test.f90: 9)
0 inform, 0 warnings, 2 severes, 0 fatal for cuf_test
test.f90:

Hi qqmber,

Thanks for the report. I see two issues here.

The first is that we aren’t decorating the function name correctly when the function name includes a cuf kernel and is called without arguments. This only occurs on 32-bit Windows. The work around is to call cuf_test with arguments. I have sent this issue to our engineers as TPR#18202.

However, for all targets, this code will get wrong answers since the loop bound variables have the “constant” attribute. The work around is to remove the constant attribute. I’m not sure if this is a compiler error or this is not supported and the compiler should be issuing a semantic error. I have logged this issue as TPR#18203.

Combining the two work arounds:

module test
implicit none
real*8,device,allocatable :: arr(:,:,:)
real*8,allocatable :: arrh(:,:,:)
integer :: Nx,Ny,Nz
contains

subroutine cuf_test (x,y,z)
integer :: x,y,z
integer :: i,j,k
!$cuf kernel do(3)<<<512>>>
do k=1,z
do j=1,y
do i=1,x
arr(i,j,k)=arr(i,j,k)**2
end do
end do
end do

end subroutine cuf_test

end module

program test_prg
use cudafor
use test
implicit none
integer ierr

allocate(arr(32,32,32))
allocate(arrh(32,32,32))
Nx=32; Ny=32; Nz=32
arr=5.

call cuf_test(Nx,Ny,Nz)
ierr = cudaThreadSynchronize()
arrh=arr
print *, arrh(2,2,2)
end program

Thanks again,
Mat

thank you for your reply, it was very helpful

qqmber,

TPR 18202 was fixed in the 11.9 release, and is closed now. Thanks for your report.

dave

TPR 18203 - CUF: !$cuf kernel does not execute when loop bounds are “constant”
is fixed in 15.5.

thanks,
dave