Nvcc only partially respects CUDA_HOME ("Input file newer than toolkit")

olupton · July 2, 2021, 8:53am

We have a project that uses both OpenACC and native CUDA, so we use a build environment with the NVIDIA HPC compilers (here 21.2) and a version of CUDA (11.0) chosen for compatibility with other tools. This is fragile, and whether or not it works depends on the order in which nvhpc and cuda are loaded into the environment.

The error message is:

nvlink fatal : Input file ‘/path/to/nvhpc-21.2-67d2qp/Linux_x86_64/21.2/cuda//lib64/libcudadevrt.a:cuda_device_runtime.o’ newer than toolkit (112 vs 110) (target: sm_60)

which happens when:

The nvcc binary comes from the nvhpc installation, not the cuda installation.
CUDA_HOME is set and points to the cuda installation.

In this situation nvcc reports that it will use CUDA 11.0:

which nvcc
/path/to/nvhpc-21.2-67d2qp/Linux_x86_64/21.2/compilers/bin/nvcc

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:38_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0

CUDA_HOME=/path/to/cuda-11.0.2-kb4wci

But a trivial example fails to compile with the above error:

echo ‘’ > dummy.cu && nvcc dummy.cu -dc -o dummy.cu.o && nvcc dummy.cu.o -o dummy
nvlink fatal : Input file ‘/path/to/nvhpc-21.2-67d2qp/Linux_x86_64/21.2/cuda//lib64/libcudadevrt.a:cuda_device_runtime.o’ newer than toolkit (112 vs 110)

This appears to be because nvcc derives a library search path from its own location. If I add a -dryrun to the second nvcc invocation I see

#$ nvlink --arch=sm_52 --register-link-binaries=“…” -m64 -L"/path/to/nvhpc-21.2-67d2qp/Linux_x86_64/21.2/cuda//lib64"…

and this

/path/to/nvhpc-21.2-67d2qp/Linux_x86_64/21.2/cuda/lib64

directory is a symbolic link to

/path/to/nvhpc-21.2-67d2qp/Linux_x86_64/21.2/cuda/11.2/lib64

which is not from the 11.0 CUDA version specified by CUDA_HOME and reported by nvcc --version.

If the environment is changed so that nvcc comes from the CUDA installation instead of the HPC SDK then it seems to work. On our system this means that module load nvhpc cuda works but module load cuda nvhpc does not.

For the sake of debugging, I note that just module load nvhpc and module load cuda nvhpc followed by unset CUDA_HOME also avoid the error, although presumably this is because CUDA 11.2 is being used throughout.

At the end of this post I have included a small test script, which might need a little adaptation. Only the cuda_nvhpc branch of the script gives the version mismatch error above.

What can we do to make this setup more robust? The behaviour of nvcc here, where --version reports one CUDA version but it links against another version, seems surprising.

nvhpc_version=21.2
first_module=unstable
for config in nvhpc nvhpc_cuda cuda_nvhpc_unset_cuda_home cuda_nvhpc
do
  echo ${config}
  module purge
  if [[ ${config} == nvhpc ]];
  then
    module load ${first_module} nvhpc/${nvhpc_version}
  elif [[ ${config} == nvhpc_cuda ]];
  then
    module load ${first_module} nvhpc/${nvhpc_version} cuda
  elif [[ ${config} == cuda_nvhpc || ${config} == cuda_nvhpc_unset_cuda_home ]];
  then
    module load ${first_module} cuda nvhpc/${nvhpc_version}
  fi
  if [[ ${config} == cuda_nvhpc_unset_cuda_home ]];
  then
    unset CUDA_HOME
  fi
  echo which nvcc
  which nvcc
  echo nvcc --version
  nvcc --version
  echo "CUDA_HOME=${CUDA_HOME}"
  echo '' > dummy.cu && nvcc dummy.cu -dc -o dummy.cu.o && nvcc dummy.cu.o -o dummy
  nvcc dummy.cu.o -o dummy -dryrun
don

Topic		Replies	Views
Nvlink fatal Legacy PGI Compilers	7	4603	June 13, 2022
How to use nvcc? CUDA Programming and Performance	9	37304	November 12, 2009
How to verify version match of toolkit and driver CUDA Setup and Installation	8	13656	September 21, 2019
Using CUDA 12 with NV HPC SDK 22.11 nvc, nvc++ and nvfortran	3	2040	May 1, 2023
CUDA version not available message with nvc++ on Ubuntu nvc, nvc++ and nvfortran	11	7548	April 30, 2021
Module control of Cuda version in HPC SDK CUDA Setup and Installation cuda , hpc	2	1490	November 5, 2024
nvcc (nvlink) not linking against device code library CUDA Programming and Performance	7	11305	June 20, 2018
Link not working nvc, nvc++ and nvfortran	6	1004	November 12, 2020
nvcc & clang 7 (no typo here) CUDA Programming and Performance	36	33347	September 30, 2016
Nvcc error : 'cicc' died with status 0xC0000005 - Only in DEBUG mode CUDA NVCC Compiler	7	2372	April 30, 2024

Nvcc only partially respects CUDA_HOME ("Input file newer than toolkit")

Related topics