libcudafor.so: undefined reference to...

Hi everybody,
I installed the PGI community edition, version 17.4, both on my desktop and on my laptop. To check the installations I’m trying to compile the following code:

program test_cublasCgemm 
use cudafor 
interface 
  subroutine ccgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc ) bind(c,name='cublasCgemm') 
   use iso_c_binding 
   integer(c_int), value :: m, n, k, lda, ldb, ldc 
   complex(c_float), device, dimension(m,n) :: a, b, c 
   complex(c_float), value :: alpha, beta 
   character(kind=c_char), value :: transa, transb 
  end subroutine ccgemm 
end interface 

complex, device, allocatable, dimension(:,:) :: dA, dB, dC 
complex, allocatable, dimension(:,:) :: a, b, c, c1 
complex :: alpha = (1.0e0,0.0e0) 
complex :: beta  = (0.0e0,0.0e0) 
real :: t1, t2, t3, tt, gflops 
integer :: i, j, k 

print *, "Enter N: " 
read(5,*) n 

allocate(a(n,n), b(n,n), c(n,n), c1(n,n)) 
allocate(dA(n,n), dB(n,n), dC(n,n)) 

a = (2.0e0,1.0e0) 
b = (1.5e0,0.0e0) 
c = (-9.9e0,0.0e0) 

call cpu_time(t1) 
dA = a 
dB = b 
if (beta .ne. (0.0e0,0.0e0)) then 
  dC = c 
endif 
call ccgemm('n', 'n', n, n, n, alpha, dA, n, dB, n, beta, dC, n) 
c1 = dC 
call cpu_time(t2) 
call cgemm('n', 'n', n, n, n, alpha, a, n, b, n, beta, c, n) 
call cpu_time(t3) 

print *, "Checking results...." 

do j = 1, n 
  do i = 1, n 
    if (c(i,j)-c1(i,j) .ne. (0.0e0,0.0e0)) then 
      print *, "error:  ",i,j
      print *, c(i,j)
      print *, c1(i,j) 
    endif 
  enddo 
enddo 

gflops = (real(n) * real(n) * real(n) * 2.0) / 1000000000.0 
tt = t2 - t1 
print *, "Total Time GPU: ",tt 
print *, "Total GPU gflops: ",gflops/tt 
tt = t3 - t2
print *, "Total Time Host: ",tt 
print *, "Total Host gflops: ",gflops/tt 
print *, "Done...." 
end

with the following compilation command for both systems:

pgfortan -Mcuda -o cgemm_inter cgemm_inter.F90 -lcublas -lblas

While the compilation is successful on the laptop returning a working executable, on the desktop I’m getting the following errors:

/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseCcsrgemm2'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseDcsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseZcsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseScsrgemm2'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseZcsrgemm2'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseCcsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasDgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasSgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseScsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseZcsrcolor'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseDcsrgemm2'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasCgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasZgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseCcsrcolor'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseScsrcolor'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseDcsrcolor'
pgacclnk: child process exit status 1: /usr/bin/ld

Do you have any hints of what can be the cause?
Let me add that on both systems the code is installed on the default path

/opt/pgi

and that the same environmental variables are defined.
Finally, here below you will find the full output obtained with the -v flag for the case of the unsuccessful compilation.

Export PGI=/opt/pgi

/opt/pgi/linux86-64/17.4/bin/pgf901 cgemm_inter.F90 -opt 1 -nohpf -nostatic -x 19 0x400000 -quad -x 59 4 -x 15 2 -x 49 0x400004 -x 51 0x20 -x 57 0x4c -x 58 0x10000 -x 124 0x1000 -tp nehalem -x 57 0xfb0000 -x 58 0x78031040 -x 47 0x08 -x 48 4608 -x 49 0x100 -x 120 0x200 -stdinc /opt/pgi/linux86-64/17.4/include-gcc49:/opt/pgi/linux86-64/17.4/include:/usr/lib/gcc/x86_64-linux-gnu/4.9/include:/usr/local/include:/usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed:/usr/include/x86_64-linux-gnu:/usr/include -cmdline '+pgfortran cgemm_inter.F90 -Mcuda -o cgemm_inter -lcublas -lblas -v' -def unix -def __unix -def __unix__ -def linux -def __linux -def __linux__ -def __NO_MATH_INLINES -def __LP64__ -def __x86_64 -def __x86_64__ -def __LONG_MAX__=9223372036854775807L -def '__SIZE_TYPE__=unsigned long int' -def '__PTRDIFF_TYPE__=long int' -def __THROW= -def __extension__= -def __amd_64__amd64__ -def __k8 -def __k8__ -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -def __STDC_HOSTED__ -def _CUDA -preprocess -freeform -vect 48 -y 54 1 -def __CUDA_API_VERSION=7050 -x 70 0x40000000 -x 189 0x8000 -y 163 0xc0000000 -x 189 0x10 -x 137 1 -modexport /tmp/pgfortranYpZdwYZad_2J.cmod -modindex /tmp/pgfortrancpZdgfsOiMBo.cmdx -output /tmp/pgfortranspZd2orPyeET.ilm
  0 inform,   0 warnings,   0 severes, 0 fatal for test_cublascgemm
PGF90/x86-64 Linux 17.4-0: compilation successful

/opt/pgi/linux86-64/17.4/bin/pgf902 /tmp/pgfortranspZd2orPyeET.ilm -fn cgemm_inter.F90 -opt 1 -x 51 0x20 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -x 59 4 -tp nehalem -x 120 0x1000 -x 124 0x1400 -y 15 2 -x 57 0x3b0000 -x 58 0x48000000 -x 49 0x100 -x 120 0x200 -astype 0 -x 70 0x40000000 -x 124 1 -x 189 0x8000 -y 163 0xc0000000 -x 189 0x10 -y 189 0x4000000 -x 137 1 -x 121 0xc00 -x 180 0x4000000 -x 176 0x100 -cudacap 30 -cudacap 35 -cudacap 50 -cudaver 7.5 -cmdline '+pgfortran cgemm_inter.F90 -Mcuda -o cgemm_inter -lcublas -lblas -v' -asm /tmp/pgfortranspZd2TItnAeC.s
  0 inform,   0 warnings,   0 severes, 0 fatal for test_cublascgemm
PGF90/x86-64 Linux 17.4-0: compilation successful

/usr/bin/as /tmp/pgfortranspZd2TItnAeC.s -I/opt/pgi/linux86-64/2017/cuda/7.5/include/ -o /tmp/pgfortranYpZdwb8GwAy8.o

/opt/pgi/linux86-64/17.4/bin/pgappend -noerror /tmp/pgfortranYpZdwb8GwAy8.o -name .IPDINFO /tmp/pgfortranYpZdwYZad_2J.cmod -name .IPEINFO /tmp/pgfortrancpZdgfsOiMBo.cmdx

/opt/pgi/linux86-64/17.4/bin/pgacclnk -nvidia /opt/pgi/linux86-64/17.4/bin/pgnvd -cuda7.5 -cudalink -computecap=30 -computecap=35 -computecap=50 -v /usr/bin/ld /usr/lib64/crt1.o /usr/lib64/crti.o /opt/pgi/linux86-64/17.4/lib/trace_init.o /usr/lib/gcc/x86_64-linux-gnu/4.9/crtbegin.o /opt/pgi/linux86-64/17.4/lib/initmp.o /opt/pgi/linux86-64/17.4/lib/f90main.o --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /opt/pgi/linux86-64/17.4/lib/pgi.ld -L/opt/pgi/linux86-64/17.4/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.9 /tmp/pgfortranYpZdwb8GwAy8.o -lcublas -lblas -rpath /opt/pgi/linux86-64/17.4/lib -rpath /opt/pgi/linux86-64/2017/cuda/7.5/lib64 -rpath /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib64 -o cgemm_inter -L/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib64 -lcudafor -lcudafor -lcudaforblas -L/opt/pgi/linux86-64/2017/cuda/7.5/lib64 -lcudart -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgmp -lnuma -lpthread -lnspgc -lpgc -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s /usr/lib/gcc/x86_64-linux-gnu/4.9/crtend.o /usr/lib64/crtn.o
/opt/pgi/linux86-64/17.4/bin/pgnvd /opt/pgi/linux86-64/17.4/lib/trace_init.o /opt/pgi/linux86-64/17.4/lib/initmp.o /opt/pgi/linux86-64/17.4/lib/f90main.o -L/opt/pgi/linux86-64/17.4/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.9 /tmp/pgfortranYpZdwb8GwAy8.o -lcublas -lblas -L/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib64 -lcudafor -lcudafor -lcudaforblas -L/opt/pgi/linux86-64/2017/cuda/7.5/lib64 -lcudart -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgmp -lnuma -lpthread -lnspgc -lpgc -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s -dolink -cuda7.5 -computecap 30 -o /tmp/pgcudaLuZdV7sVsZOj.cubin -regobj /tmp/pgcudareg9uZd3uWML_ac.o -v
Export LD_LIBRARY_PATH=/opt/pgi/linux86-64/2017/cuda/7.5/nvvm/lib64:/opt/intel/Compiler/11.1/059/lib/intel64:/opt/intel/Compiler/11.1/059/ipp/em64t/sharedlib:/opt/intel/Compiler/11.1/059/mkl/lib/em64t:/opt/intel/Compiler/11.1/059/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/mpich/3.1.4-gfortran/lib:/opt/petsc/3.4.4/lib:/opt/hdf5/1.8.13/lib
Export DYLD_LIBRARY_PATH=/opt/pgi/linux86-64/2017/cuda/7.5/nvvm/lib:/opt/intel/Compiler/11.1/059/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
Export PATH=/opt/intel/Compiler/11.1/059/bin/intel64:/opt/mpich/3.1.4-gfortran/bin:/home/valerio/Programmi/MCNP/MCNP_CODE/bin:/opt/pgi/linux86-64/2017/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

/opt/pgi/linux86-64/2017/cuda/7.5/bin/nvlink --arch=sm_30 -m64 -L/opt/pgi/linux86-64/17.4/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib64 -L/opt/pgi/linux86-64/2017/cuda/7.5/lib64 /opt/pgi/linux86-64/17.4/lib/trace_init.o /opt/pgi/linux86-64/17.4/lib/initmp.o /opt/pgi/linux86-64/17.4/lib/f90main.o /tmp/pgfortranYpZdwb8GwAy8.o -lcublas -lblas -lcudafor -lcudafor -lcudaforblas -lcudart -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgmp -lnuma -lpthread -lnspgc -lpgc -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s --register-link-binaries=/tmp/pgcudaLuZdV7sVsZOj.reg.c -o /tmp/pgcudaLuZdV7sVsZOj.cubin

/usr/bin/gcc -m64 -c -I. -o/tmp/pgcudareg9uZd3uWML_ac.o -DREGFILE="/tmp/pgcudaLuZdV7sVsZOj.reg.c" /opt/pgi/linux86-64/17.4/include_acc/linkstub75.c
Unlinking /tmp/pgcudaLuZdV7sVsZOj.reg.c
/opt/pgi/linux86-64/17.4/bin/pgnvd /opt/pgi/linux86-64/17.4/lib/trace_init.o /opt/pgi/linux86-64/17.4/lib/initmp.o /opt/pgi/linux86-64/17.4/lib/f90main.o -L/opt/pgi/linux86-64/17.4/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.9 /tmp/pgfortranYpZdwb8GwAy8.o -lcublas -lblas -L/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib64 -lcudafor -lcudafor -lcudaforblas -L/opt/pgi/linux86-64/2017/cuda/7.5/lib64 -lcudart -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgmp -lnuma -lpthread -lnspgc -lpgc -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s -dolink -cuda7.5 -computecap 35 -o /tmp/pgcudanuZdNzkqQmpk.cubin -regobj /tmp/pgcudareg9uZd3uWML_ac.o -v
Export LD_LIBRARY_PATH=/opt/pgi/linux86-64/2017/cuda/7.5/nvvm/lib64:/opt/intel/Compiler/11.1/059/lib/intel64:/opt/intel/Compiler/11.1/059/ipp/em64t/sharedlib:/opt/intel/Compiler/11.1/059/mkl/lib/em64t:/opt/intel/Compiler/11.1/059/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/mpich/3.1.4-gfortran/lib:/opt/petsc/3.4.4/lib:/opt/hdf5/1.8.13/lib
Export DYLD_LIBRARY_PATH=/opt/pgi/linux86-64/2017/cuda/7.5/nvvm/lib:/opt/intel/Compiler/11.1/059/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
Export PATH=/opt/intel/Compiler/11.1/059/bin/intel64:/opt/mpich/3.1.4-gfortran/bin:/home/valerio/Programmi/MCNP/MCNP_CODE/bin:/opt/pgi/linux86-64/2017/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

/opt/pgi/linux86-64/2017/cuda/7.5/bin/nvlink --arch=sm_35 -m64 -L/opt/pgi/linux86-64/17.4/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib64 -L/opt/pgi/linux86-64/2017/cuda/7.5/lib64 /opt/pgi/linux86-64/17.4/lib/trace_init.o /opt/pgi/linux86-64/17.4/lib/initmp.o /opt/pgi/linux86-64/17.4/lib/f90main.o /tmp/pgfortranYpZdwb8GwAy8.o -lcublas -lblas -lcudafor -lcudafor -lcudaforblas -lcudart -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgmp -lnuma -lpthread -lnspgc -lpgc -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s --register-link-binaries=/tmp/pgcudanuZdNzkqQmpk.reg.c -o /tmp/pgcudanuZdNzkqQmpk.cubin

/usr/bin/gcc -m64 -c -I. -o/tmp/pgcudareg9uZd3uWML_ac.o -DREGFILE="/tmp/pgcudanuZdNzkqQmpk.reg.c" /opt/pgi/linux86-64/17.4/include_acc/linkstub75.c
Unlinking /tmp/pgcudanuZdNzkqQmpk.reg.c
/opt/pgi/linux86-64/17.4/bin/pgnvd /opt/pgi/linux86-64/17.4/lib/trace_init.o /opt/pgi/linux86-64/17.4/lib/initmp.o /opt/pgi/linux86-64/17.4/lib/f90main.o -L/opt/pgi/linux86-64/17.4/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.9 /tmp/pgfortranYpZdwb8GwAy8.o -lcublas -lblas -L/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib64 -lcudafor -lcudafor -lcudaforblas -L/opt/pgi/linux86-64/2017/cuda/7.5/lib64 -lcudart -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgmp -lnuma -lpthread -lnspgc -lpgc -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s -dolink -cuda7.5 -computecap 50 -o /tmp/pgcuda1uZdFM9RcZOR.cubin -regobj /tmp/pgcudareg9uZd3uWML_ac.o -v
Export LD_LIBRARY_PATH=/opt/pgi/linux86-64/2017/cuda/7.5/nvvm/lib64:/opt/intel/Compiler/11.1/059/lib/intel64:/opt/intel/Compiler/11.1/059/ipp/em64t/sharedlib:/opt/intel/Compiler/11.1/059/mkl/lib/em64t:/opt/intel/Compiler/11.1/059/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/mpich/3.1.4-gfortran/lib:/opt/petsc/3.4.4/lib:/opt/hdf5/1.8.13/lib
Export DYLD_LIBRARY_PATH=/opt/pgi/linux86-64/2017/cuda/7.5/nvvm/lib:/opt/intel/Compiler/11.1/059/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
Export PATH=/opt/intel/Compiler/11.1/059/bin/intel64:/opt/mpich/3.1.4-gfortran/bin:/home/valerio/Programmi/MCNP/MCNP_CODE/bin:/opt/pgi/linux86-64/2017/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

/opt/pgi/linux86-64/2017/cuda/7.5/bin/nvlink --arch=sm_50 -m64 -L/opt/pgi/linux86-64/17.4/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.9 -L/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib64 -L/opt/pgi/linux86-64/2017/cuda/7.5/lib64 /opt/pgi/linux86-64/17.4/lib/trace_init.o /opt/pgi/linux86-64/17.4/lib/initmp.o /opt/pgi/linux86-64/17.4/lib/f90main.o /tmp/pgfortranYpZdwb8GwAy8.o -lcublas -lblas -lcudafor -lcudafor -lcudaforblas -lcudart -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgmp -lnuma -lpthread -lnspgc -lpgc -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s --register-link-binaries=/tmp/pgcuda1uZdFM9RcZOR.reg.c -o /tmp/pgcuda1uZdFM9RcZOR.cubin

/usr/bin/gcc -m64 -c -I. -o/tmp/pgcudareg9uZd3uWML_ac.o -DREGFILE="/tmp/pgcuda1uZdFM9RcZOR.reg.c" /opt/pgi/linux86-64/17.4/include_acc/linkstub75.c
Unlinking /tmp/pgcuda1uZdFM9RcZOR.reg.c
/opt/pgi/linux86-64/17.4/bin/pgnvd -fatobj /tmp/pgcudafatDuZdxAk6Bppu.o -o /tmp/pgcudafatDuZdxAk6Bppu.o -cuda7.5 -v -sm 30 /tmp/pgcudaLuZdV7sVsZOj.cubin -sm 35 /tmp/pgcudanuZdNzkqQmpk.cubin -sm 50 /tmp/pgcuda1uZdFM9RcZOR.cubin
Export LD_LIBRARY_PATH=/opt/pgi/linux86-64/2017/cuda/7.5/nvvm/lib64:/opt/intel/Compiler/11.1/059/lib/intel64:/opt/intel/Compiler/11.1/059/ipp/em64t/sharedlib:/opt/intel/Compiler/11.1/059/mkl/lib/em64t:/opt/intel/Compiler/11.1/059/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/mpich/3.1.4-gfortran/lib:/opt/petsc/3.4.4/lib:/opt/hdf5/1.8.13/lib
Export DYLD_LIBRARY_PATH=/opt/pgi/linux86-64/2017/cuda/7.5/nvvm/lib:/opt/intel/Compiler/11.1/059/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
Export PATH=/opt/intel/Compiler/11.1/059/bin/intel64:/opt/mpich/3.1.4-gfortran/bin:/home/valerio/Programmi/MCNP/MCNP_CODE/bin:/opt/pgi/linux86-64/2017/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

/opt/pgi/linux86-64/2017/cuda/7.5/bin/fatbinary --64 --create=/tmp/pgnvdmKZdKrO1nBG5.fat --key=x_dlink --ident=/tmp/pgcudafatDuZdxAk6Bppu.o --image=profile=sm_30,file=/tmp/pgcudaLuZdV7sVsZOj.cubin --image=profile=sm_35,file=/tmp/pgcudanuZdNzkqQmpk.cubin --image=profile=sm_50,file=/tmp/pgcuda1uZdFM9RcZOR.cubin

/opt/pgi/linux86-64/17.4/bin/pgimport /tmp/pgnvdSKZdeEQ4XYNk.s /tmp/pgnvdmKZdKrO1nBG5.fat -var __PGI_CUDA_LOC -ccname __PGI_CUDA_CAP -cc30 -cc35 -cc50

/usr/bin/as -o /tmp/pgcudafatDuZdxAk6Bppu.o /tmp/pgnvdSKZdeEQ4XYNk.s
Unlinking /tmp/pgnvdmKZdKrO1nBG5.fat
Unlinking /tmp/pgnvdSKZdeEQ4XYNk.s
/usr/bin/ld /usr/lib64/crt1.o /usr/lib64/crti.o /tmp/pgcudafatDuZdxAk6Bppu.o /tmp/pgcudareg9uZd3uWML_ac.o /opt/pgi/linux86-64/17.4/lib/trace_init.o /usr/lib/gcc/x86_64-linux-gnu/4.9/crtbegin.o /opt/pgi/linux86-64/17.4/lib/initmp.o /opt/pgi/linux86-64/17.4/lib/f90main.o --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /opt/pgi/linux86-64/17.4/lib/pgi.ld -L/opt/pgi/linux86-64/17.4/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.9 /tmp/pgfortranYpZdwb8GwAy8.o -lcublas -lblas -rpath /opt/pgi/linux86-64/17.4/lib -rpath /opt/pgi/linux86-64/2017/cuda/7.5/lib64 -rpath /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib64 -o cgemm_inter -L/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib64 -lcudafor -lcudafor -lcudaforblas -L/opt/pgi/linux86-64/2017/cuda/7.5/lib64 -lcudart -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgmp -lnuma -lpthread -lnspgc -lpgc -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s /usr/lib/gcc/x86_64-linux-gnu/4.9/crtend.o /usr/lib64/crtn.o
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseCcsrgemm2'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseDcsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseZcsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseScsrgemm2'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseZcsrgemm2'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseCcsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasDgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasSgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseScsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseZcsrcolor'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseDcsrgemm2'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasCgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasZgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseCcsrcolor'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseScsrcolor'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseDcsrcolor'
pgacclnk: child process exit status 1: /usr/bin/ld
unlink /tmp/pgcudafatDuZdxAk6Bppu.o
unlink /tmp/pgcudareg9uZd3uWML_ac.o
unlink /tmp/pgcudaLuZdV7sVsZOj.cubin
unlink /tmp/pgcudanuZdNzkqQmpk.cubin
unlink /tmp/pgcuda1uZdFM9RcZOR.cubin
pgfortran-Fatal-linker completed with exit code 1

Unlinking /tmp/pgfortranspZd2orPyeET.ilm
Unlinking /tmp/pgfortranIpZdMfBem48I.stb
Unlinking /tmp/pgfortranYpZdwYZad_2J.cmod
Unlinking /tmp/pgfortrancpZdgfsOiMBo.cmdx
Unlinking /tmp/pgfortranspZd2TItnAeC.s
Unlinking /tmp/pgfortranIpZdMC4GvgfO.ll
Unlinking /tmp/pgfortranYpZdwb8GwAy8.o

Thanks and …sorry for the length of this post.

Valerio

To give you some more information, here you will find the output of the pgaccelinfo command.

Desktop:

CUDA Driver Version:           7050
NVRM version:                  NVIDIA UNIX x86_64 Kernel Module  352.79  Wed Jan 13 16:17:53 PST 2016

Device Number:                 0
Device Name:                   GeForce GTX 660 Ti
Device Revision Number:        3.0
Global Memory Size:            2146762752
Number of Multiprocessors:     7
Number of SP Cores:            1344
Number of DP Cores:            448
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           65536
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       2147483647 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    980 MHz
Execution Timeout:             Yes
Integrated Device:             No
Can Map Host Memory:           Yes
Compute Mode:                  default
Concurrent Kernels:            Yes
ECC Enabled:                   No
Memory Clock Rate:             3004 MHz
Memory Bus Width:              192 bits
L2 Cache Size:                 393216 bytes
Max Threads Per SMP:           2048
Async Engines:                 1
Unified Addressing:            Yes
Managed Memory:                Yes
PGI Compiler Option:           -ta=tesla:cc30

Laptop:

CUDA Driver Version:           8000
NVRM version:                  NVIDIA UNIX x86_64 Kernel Module  375.39  Tue Jan 31 20:47:00 PST 2017

Device Number:                 0
Device Name:                   GeForce 940M
Device Revision Number:        5.0
Global Memory Size:            2100232192
Number of Multiprocessors:     3
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           65536
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       2147483647 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    1176 MHz
Execution Timeout:             Yes
Integrated Device:             No
Can Map Host Memory:           Yes
Compute Mode:                  default
Concurrent Kernels:            Yes
ECC Enabled:                   No
Memory Clock Rate:             900 MHz
Memory Bus Width:              64 bits
L2 Cache Size:                 1048576 bytes
Max Threads Per SMP:           2048
Async Engines:                 1
Unified Addressing:            Yes
Managed Memory:                Yes
PGI Compiler Option:           -ta=tesla:cc50

Valerio

I don’t see

-lcusparse

in your library list.

dave

Thanks Dave,
tomorrow morning once back on the office I will try to compile the code on my desktop adding the -lcusparse library.
However, I do not understand why on the laptop the code compiled correctly without that option while it did not on the desktop.
Have you any hints?

Valerio

The compilation of the code with the command

pgfortran -Mcuda -o cgemm_inter cgemm_inter.F90 -lcublas -lblas -lcusparse

i.e. adding -lcusparse, returns exactly the same list of errors.

/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseCcsrgemm2'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseDcsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseZcsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseScsrgemm2'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseZcsrgemm2'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseCcsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasDgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasSgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseScsrgemm2_bufferSizeExt'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseZcsrcolor'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseDcsrgemm2'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasCgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/./libcudaforwrapblas.so: undefined reference to `cublasZgetrsBatched'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseCcsrcolor'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseScsrcolor'
/opt/pgi/linux86-64/17.4/lib/libcudafor.so: undefined reference to `cusparseDcsrcolor'
pgacclnk: child process exit status 1: /usr/bin/ld

Any other idea?

My apologies for the lazy response.

Your issue is that when build on one system and run on another
with the same installation, the program fails to find dynamic links.

Your original build line was enough. However, where the libs
needed at runtime are located is different.

% pgf90 -Mcuda -o test test.F90 -lcublas -lblas
% ldd test
linux-vdso.so.1 => (0x00007fffadaff000)
libcublas.so.7.5 => /home/tull/Downloads/pgi174/linux86-64/2017/cuda/7.5/lib64/libcublas.so.7.5 (0x00007fe23ac48000)
libblas.so.0 => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libblas.so.0 (0x00007fe237f2d000)
libcudafor.so => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libcudafor.so (0x00007fe234dd0000)
libcusparse.so.7.5 => /home/tull/Downloads/pgi174/linux86-64/2017/cuda/7.5/lib64/libcusparse.so.7.5 (0x00007fe2328a5000)
libcurand.so.7.5 => /home/tull/Downloads/pgi174/linux86-64/2017/cuda/7.5/lib64/libcurand.so.7.5 (0x00007fe22f03c000)
libcudaforwrapblas.so => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libcudaforwrapblas.so (0x00007fe22ee06000)
libcudart.so.7.5 => /home/tull/Downloads/pgi174/linux86-64/2017/cuda/7.5/lib64/libcudart.so.7.5 (0x00007fe22eba8000)
libpgf90rtl.so => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libpgf90rtl.so (0x00007fe22e980000)
libpgf90.so => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libpgf90.so (0x00007fe22e3ca000)
libpgf90_rpm1.so => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libpgf90_rpm1.so (0x00007fe22e1c8000)
libpgf902.so => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libpgf902.so (0x00007fe22dfb4000)
libpgftnrtl.so => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libpgftnrtl.so (0x00007fe22dd7e000)
libpgmp.so => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libpgmp.so (0x00007fe22dafe000)
libnuma.so.1 => /usr/lib/gcc/x86_64-redhat-linux/4.4.6/…/…/…/…/lib64/libnuma.so.1 (0x0000003723800000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000000370c800000)
libpgc.so => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libpgc.so (0x00007fe22d801000)
librt.so.1 => /lib64/librt.so.1 (0x000000370d000000)
libm.so.6 => /lib64/libm.so.6 (0x000000370bc00000)
libc.so.6 => /lib64/libc.so.6 (0x000000370c000000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003718000000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000370c400000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-redhat-linux/4.4.6/…/…/…/…/lib64/libstdc++.so.6 (0x0000003719800000)
/lib64/ld-linux-x86-64.so.2 (0x000000370b800000)


So libpgc.so, for example is expected in
/home/tull/Downloads/pgi174/linux86-64/17.4/lib/
and if instead it is in
/opt/pgi/linux86-64/17.4/lib/

you need to add that directory to LD_LIBRARY_PATH

export LD_LIBRARY_PATH=/opt/pgi/linux86-64/17.4/lib/

and the executable will now find it.

An easier thing to do is compile with -Bstatic_pgi,
which will make all PGI libs link statically and not at runtime.

% pgf90 -Mcuda -o test test.F90 -lcublas -lblas -Bstatic_pgi
% ldd test
linux-vdso.so.1 => (0x00007fff6f1ff000)
libcublas.so.7.5 => /home/tull/Downloads/pgi174/linux86-64/2017/cuda/7.5/lib64/libcublas.so.7.5 (0x00007f9575d69000)
libblas.so.0 => /home/tull/Downloads/pgi174/linux86-64/17.4/lib/libblas.so.0 (0x00007f957304e000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000370c400000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000000370c800000)
librt.so.1 => /lib64/librt.so.1 (0x000000370d000000)
libm.so.6 => /lib64/libm.so.6 (0x000000370bc00000)
libc.so.6 => /lib64/libc.so.6 (0x000000370c000000)
/lib64/ld-linux-x86-64.so.2 (0x000000370b800000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003718000000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-redhat-linux/4.4.6/…/…/…/…/lib64/libstdc++.so.6 (0x0000003719800000)

The -Bstatic_pgi ldd result above tells me that libcublas is not linked statically (a bug), so you will need to add the libcublas.so directory
to LD_LIBRARY_PATH.

Sorry, but perhaps I did not explain myself correctly, as I’m not building the code on a system and then running it on another.
I’m instead building separately the code on both systems using the same version of the PGI Community Edition (17.4) and the same building command.
While on the laptop I’m able to build the code, on the desktop the building process fails, returning the above shown errors.

So the issue is that either the installation is incomplete or corrupted,
since the example should build.
Does the executable built on the laptop run on the desktop?
That would be good to know.

Either post or send to trs@pgroup.com the following

On laptop
pgfortran -V

pgf90 -Mcuda -o test test.F90 -lcublas -lblas -v -Wl,-t

./test


On the desktop
pgfortran -V

pgf90 -Mcuda -o test test.F90 -lcublas -lblas -v -Wl,-t


This should determine where the failure occurs. If the failure
occurs with a library, you might compare the library size and
checksum on each installation to see if there is a difference.

dave

Tomorrow is a national holiday. Next Monday once back to the office I will issue the commands you suggested and let you know the results.
Nice weekend to everybody.

Hi,
I issued, both on the laptop and the desktop, the command you suggested:

pgfortran -V
pgf90 -Mcuda -o test test.F90 -lcublas -lblas -v -Wl,-t

Looking at the outputs I found the in case of the desktop the compiler is trying to link the wrong cublas library:


-lcublas (/usr/lib64/libcublas.so)

The laptop is instead linking the proper one:


-lcublas (/opt/pgi/linux86-64/2017/cuda/7.5/lib64/libcublas.so)

Adding

-L/opt/pgi/linux86-64/2017/cuda/7.5/lib64

to the building command it returns no errors and the linked cublas library is now the proper one:


-lcublas (/opt/pgi/linux86-64/2017/cuda/7.5/lib64/libcublas.so)

.
The code then runs correctly producing:

valerio@giusti-145 ~/Projects/test_cuda_pgi $ ./cgemm_inter
Enter N:
1500
Checking results…
Total Time GPU: 0.1212480
Total GPU gflops: 55.67102
Total Time Host: 1.055503
Total Host gflops: 6.395054
Done…
valerio@giusti-145 ~/Projects/test_cuda_pgi $

I guess I should remove the old CUDA Toolkit (it was installed through the packages available on the backports repository of Linux Mint Debian Edition 2).

Thanks Dave for your help.

Or you can edit the localrc file in your bin directory, and correct the
bad path to libcuda.

Or you can edit/create a siterc file in your home directory that
corrects the situation just for yourself. siterc is read after localrc
(see the -dryrun output), so if you change the path to cuda libs there,
the driver will use that path instead of localrc’s.

Thanks for your persistence and patience.

dave