HPC SDK 20.9 and CUDA 10.1

Hi.

I am trying to install HPC SDK 20.9 on IBM Power System AC922 machine with CentOS 7.6 operating system and NVIDIA driver version 418.181.07. As far as I understand from the system requirements, these software versions are in the list of supported ones (https://docs.nvidia.com/hpc-sdk/archive/20.9/hpc-sdk-release-notes/index.html). However, after installing the SDK and setting the environment variables, I cannot compile a single example in the “CUDA-Fortran”, “CUDA-Libraries”, etc. In all cases, the error manifests itself at the linking stage and consists in the fact that the linker cannot find the required libraries.

[user@host SDK]# make bandwidth_test
cd bandwidthTest; make build; make run; make clean
make[1]: Entering directory `/home/user/Temp/HPC_SDK/CUDA-Fortran/SDK/bandwidthTest'
nvfortran  -fast -o bandwidthTest.out bandwidthTest.cuf
/usr/bin/ld: cannot find libcurand.so
pgacclnk: child process exit status 1: /usr/bin/ld
make[1]: *** [build] Error 2
make[1]: Leaving directory `/home/user/Temp/HPC_SDK/CUDA-Fortran/SDK/bandwidthTest'
make[1]: Entering directory `/home/user/Temp/HPC_SDK/CUDA-Fortran/SDK/bandwidthTest'
make[1]: *** No rule to make target `bandwidthTest.out', needed by `run'.  Stop.
make[1]: Leaving directory `/home/user/Temp/HPC_SDK/CUDA-Fortran/SDK/bandwidthTest'
make[1]: Entering directory `/home/user/Temp/HPC_SDK/CUDA-Fortran/SDK/bandwidthTest'
Cleaning up...
make[1]: Leaving directory `/home/user/Temp/HPC_SDK/CUDA-Fortran/SDK/bandwidthTest'

Also, the NVCC from the SDK does not work:

[user@host SDK]# which nvcc
/opt/nvidia/hpc_sdk/Linux_ppc64le/20.9/compilers/bin/nvcc
[user@host SDK]# nvcc
nvcc-Error-The nvc++ host compiler is only supported with CUDA 11.0 or newer

Installing and configuring cuda-compat-11-0 didn’t help.

Can you please tell me how can I solve these problems?

Best regards,
Sergey.

Hi Sergey,

When you downloaded 20.9, did you download the package with multiple CUDA versions (including CUDA 10.1) or the one with only CUDA 11.0? You can see what CUDA versions were install by looking in the “Linux_ppc64le/20.9/cuda/” directory.

By default, the compiler will auto-detect the CUDA driver installed and then use the appropriate version. However, you will get these types of errors if that version is not included with the compilers.

Can you please tell me how can I solve these problems?

You can update your driver to CUDA 11.0, or set the environment variable “CUDA_HOME” to point to a separate CUDA 10.1 SDK installation.

If you aren’t able to update the CUDA driver, you’ll want to use the nvcc included in the CUDA 10.1 SDK.

-Mat

Hi Mat,

Thanks for the answer!

I am using the package with multiple versions of CUDA (10.1, 10.2 and 11.0). Also I tried to set the CUDA_HOME variable, but it didn’t help. The problem was solved by loading environment module file that comes with the SDK. It seems that for the compilers to work properly, it is not enough to set the environment variables specified in the installation guide (https://docs.nvidia.com/hpc-sdk/archive/20.9/hpc-sdk-install-guide/index.html#install-linux-end-usr-env-settings).

The only remaining issue now is the inoperative nvcc. Will it be enough to prepend the path to the directory “/opt/nvidia/hpc_sdk/Linux_ppc64le/20.9/cuda/10.1/bin” to the PATH or do I also need to prepend the paths to the header files and libraries from CUDA 10.1 to the CPATH and LD_LIBRARY_PATH, respectively? Will this configuration be fully functional?

Now it is not possible for me to update the driver to version 450, since only version 440 is available for the CentOS 7.6 operating system and ppc64le architecture. To switch to the new driver, I need to install RedHat 8, which is not supported by some of the software I use.

Best regards,
Sergey.

The nvcc in the compiler bin directory is CUDA 11.0, but you should be able to use the CUDA 10.0 nvcc by setting your environment’s PATH to include the “Linux_ppc64le/20.9/cuda/10.0/bin” directory before the compiler PATH, “Linux_ppc64le/20.9/compilers/bin”.

For example (adjust the base path as needed):

% which nvcc
/proj/nv/Linux_ppc64le/20.9/compilers/bin/nvcc
% nvcc --version
nvcc-Error-The nvc++ host compiler is only supported with CUDA 11.0 or newer
% export PATH=/proj/nv/Linux_ppc64le/20.9/cuda/10.0/bin/:$PATH
% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:10:00_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130