I have just installed the nvidia hpc sdk (the version bundled with multiple version of cuda). And I want to use module system to control the version of cuda. I wish to use the module files in the /opt/nvidia/hpc_sdk/modulefiles directory to do that. But when i tried to load the one with cuda11, I found that the nvcc compiler is still version 12.1. I haven’t set any environment variables beside using module system to avoid conflicts.
I can use the following command to illustrate the problem (I use ubuntu 22.04 LTS). I first check there’s no nvcc in my PATH before loading module. Then I load the module with cuda11. Then I check the nvcc version, which is 12.1.
$ source /etc/profile.d/modules.sh
$ nvcc --version
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
$ module load /opt/nvidia/hpc_sdk/modulefiles/nvhpc-hpcx-cuda11/23.5
Loading /opt/nvidia/hpc_sdk/modulefiles/nvhpc-hpcx-cuda11/23.5
Loading requirement: hpcx
tuochuyi@tuochuyi:~$ module list
Currently Loaded Modulefiles:
1) hpcx 2) /opt/nvidia/hpc_sdk/modulefiles/nvhpc-hpcx-cuda11/23.5
Key:
auto-loaded
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
But the PATH and LD_LIBRARY_PATH is correctly with version 11.8.
$ echo $PATH
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/bin:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/clusterkit/bin:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/tests/imb:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/sharp/bin:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/hcoll/bin:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ucc/bin:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ucx/bin:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/bin:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/bin:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/home/tuochuyi/.local/share/JetBrains/Toolbox/scripts
$ echo $LD_LIBRARY_PATH
/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ompi/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/nccl_rdma_sharp_plugin/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/sharp/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/hcoll/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ucc/lib/ucc:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ucc/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ucx/lib/ucx:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/11.8/hpcx/hpcx-2.14/ucx/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/nvshmem/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/nccl/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/math_libs/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/compilers/extras/qd/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/cuda/extras/CUPTI/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/23.5/cuda/lib64
I can use find command to find other nvcc compilers with different version:
/opt/nvidia/hpc_sdk $ find . -name "nvcc"
./Linux_x86_64/23.5/compilers/bin/nvcc
./Linux_x86_64/23.5/cuda/12.1/bin/nvcc
./Linux_x86_64/23.5/cuda/11.8/bin/nvcc
I wonder why loading the module can not give me the correct version of nvcc compiler. Is this a natural situation?
Any help or comment is welcomed. Thank you!