How to compile nvfortran program with HPC-SDK 21.1 and cuda toolkit 9.1

Hi, I am a complete newbie to nvfortran (pgfortran). I have problem to compile a simple program which I will list below. But I think it is irrelevant. Any program can’t be compiled. Here is info:

  1. Version of Nvidia driver is 390xx. I cannot install a newer version on the laptop where I use external monitor with Optimus-manager.
  2. The driver is compatible with Cuda toolkit 9.1 or older.
  3. There is only the latest version HPC-SDK 21.1 available on Nvidia site. Older version of PGI fortran compiler are available on pgroup website but only for fee, so no op for me.
  4. I direct nvfortran (or pgfortran or pgf95) to use Cuda toolkit 9.1:

nvfortran CUDA_HOME=/home/popsi/Downloads/cuda-9.1 -cuda -cudalibs saxpy.cuf -L/home/popsi/Downloads/cuda-9.1/lib64 and get the following errors:

/usr/bin/ld: cannot find -lcutensor
/usr/bin/ld: cannot find -lnccl
/usr/bin/ld: cannot find -lnvshmem
pgacclnk: child process exit status 1: /usr/bin/ld

  1. Compiling with Cuda toolkit that comes with HPC-SDK 21.1 by:

pgfortran CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/21.1/cuda/11.2 -cuda saxpy.cuf -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.1/math_libs/11.2/targets/x86_64-linux/lib

is successful but executable reports the error:
0: ALLOCATE: 160000 bytes requested; not enough memory: 3(initialization error)

It is true whatever small array is in the code.
I believe it is due to incompatibility between Nvidia driver 390xx and Cuda Toolkit 11 installed within HPC-SDK 21.1.

So, my question is how to compile the program.
The program is the one given as the first simple one in tutorial (saxpy.cuf).

module mathOps
attributes(global) subroutine saxpy(x, y, a)
implicit none
real :: x(:), y(:)
real, value :: a
integer :: i, n
n = size(x)
i = blockDim%x * (blockIdx%x - 1) + threadIdx%x
if (i <= n) y(i) = y(i) + a*x(i)
end subroutine saxpy
end module mathOps

program testSaxpy
use mathOps
use cudafor
implicit none
integer, parameter :: N = 40000
real :: x(N), y(N), a
real, device :: x_d(N), y_d(N)
type(dim3) :: grid, tBlock

tBlock = dim3(256,1,1)
grid = dim3(ceiling(real(N)/tBlock%x),1,1)

x = 1.0; y = 2.0; a = 2.0
x_d = x
y_d = y
call saxpy<<<grid, tBlock>>>(x_d, y_d, a)
y = y_d
write(,) 'Max error: ', maxval(abs(y-4.0))
end program testSaxpy

Hi igorpopv77,

Sorry, but the 21.1 NVHPC SDK doesn’t support or get tested with CUDA 9.1.

The problem with #4 is that you’re using the flag “-cudalib” which will link against all current CUDA Libraries, some of which didn’t exist with CUDA 9.1. Since this source doesn’t use any CUDA Libraries, you can try compiling without the flag (or use the flag with a specific library, ex. “-cudalib=cublas”).

For #5, CUDA drivers aren’t forward compatible so a 390 driver wouldn’t be able to run a CUDA 11.2 built binary.


Thanks a lot Mat for your answer! I am going through tutorial in order to learn to use nvfortran. I will need to compile some CUDA Fortran software (e.g. libxc, Quantum Espresso, GPAW, etc.) which usually use cuBlas and are written in Fortran. But I started with the simple example first to get familiarized, to learn how to compile, command line options, etc.

When I remove both -cuda and -cudalibs from command line or just retain -cuda I get following errors:

/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/21.1/compilers/lib/ undefined reference to __cudaRegisterFatBinaryEnd' /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/21.1/compilers/lib/ undefined reference to __cudaPopCallConfiguration’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/21.1/compilers/lib/ undefined reference to `__cudaPushCallConfiguration’

OK, I understand that HPC-SDK has not been tested with Cuda 9.1. But unfortunately I don’t have other choice as Nvidia driver on my machine is 390 so I cannot use the newer CUDA 11.2. As far as I can see the only option is to use some older HPC-SDK, for instance ver. 17 which comes with older CUDA. Can you tell me is it possible to download an older versions and provide me a link? I cannot find it.


Hi Igor,

I was afraid of that. The CUDA Fortran runtime library is compiled for compatibility with a particular CUDA version, with CUDA 10.1 being the oldest version we ship with 21.1.

Can you tell me is it possible to download an older versions and provide me a link?

Archive releases are available to organizations who have purchased support from NVIDIA or have an active legacy PGI Professional License. If you do have support, you can access the achieve from NVIDIA’s support portal. More information can be found under the “HPC Compilers Support Services” section at: High Performance Computing (HPC) SDK | NVIDIA


Thanks Mat! I have managed to install a new Nvidia driver, so now the latest HPC-SDK works, at least for the simple example I have only tested.

Great! Glad you we’re able to install a more recent driver and got things working.