The problem of installing and using the NVhpc SDK

Hi,Mat.I encountered an issue while installing and using the NVhpc SDK, and my platform is Linux_ x86_ 64. The graphics card is as follows:

Sun Jan 21 23:02:20 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 … Off | 00000000:01:00.0 On | N/A |
| N/A 43C P8 16W / 80W | 484MiB / 6144MiB | 4% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1255 G /usr/lib/xorg/Xorg 214MiB |
| 0 N/A N/A 1551 G /usr/bin/gnome-shell 101MiB |
| 0 N/A N/A 3414 G …irefox/2987/usr/lib/firefox/firefox 121MiB |
| 0 N/A N/A 10049 G …sion,SpareRendererForSitePerProcess 37MiB |
±--------------------------------------------------------------------------------------+

When installing the NVhpc SDK, the following issues occurred:
/opt/nvidia/hpc_sdk/$NVARCH/22.11/compilers/bin/makelocalrc -x /opt/nvidia/hpc_sdk/$NVARCH/22.11
-net /usr/nvidia/shared/22.11
/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/bin/makelocalrc: line 152: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/nvaccelinfo: No such file or directory
find: ‘/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/…/…/cuda’: No such file or directory
/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/bin/makelocalrc: line 164: bundled_cuda: bad array subscript

The setting of environmental variables is as follows:

NVARCH=_; export NVARCH
NVCOMPILERS=/opt/nvidia/hpc_sdk; export NVCOMPILERS
MANPATH=$MANPATH:$NVCOMPILERS/$NVARCH/22.11/compilers/man; export MANPATH
PATH=$NVCOMPILERS/$NVARCH/22.11/compilers/bin:$PATH; export PATHuname -s uname -m

LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64;export LD_LIBRARY_PATH
LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk;export LD_LIBRARY_PATH

MANPATH=$MANPATH:/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/man; export MANPATH
PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/bin:$PATH; export PATH
export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/mpi/bin:$PATH
export MANPATH=$MANPATH:/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/mpi/man

A. so file was compiled using NVhpcSDK and can be compiled normally. However, when using the compiled. so file for compilation, the following issues occurred:

nvc++ -acc -gpu=cuda11.8 -fast -cuda -cudalib=cufft -std=c++17 -Minfo=accel -lnppig -lnppc -lnppisu -lnppidei -lcudart -o test test.cpp -L/home/ssy/桌面/zd118/ZD_CLASS_PROCESS -lprocess
usr/bin/ld: warning: libmpi.so.40, needed by /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcusolverMp.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnccl.so.2, needed by /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Waitall’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Comm_dup’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ncclSend’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ncclGetUniqueId’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Comm_get_attr’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Bcast’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Scatterv’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_op_prod’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Comm_split’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Send’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_op_min’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Irecv’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_int8_t’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_comm_world’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Reduce’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_c_float_complex’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Wait’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_uint16_t’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Ibcast’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ncclAllGather’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPIX_Query_cuda_support’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_op_sum’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_int32_t’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_int16_t’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Barrier’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ncclGroupEnd’
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Comm_rank’ pgacclnk: child process exit status 1: /usr/bin/ldncclGroupStart’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ncclRecv’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_double’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_uint8_t’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ncclBcast’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_datatype_null’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Scatter’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Iprobe’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ncclCommInitRank’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ncclCommDestroy’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_op_max’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_byte’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Recv’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Comm_free’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_float’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Gatherv’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_uint64_t’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Allgather’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_int64_t’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_int’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Isend’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ncclReduce’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Gather’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to MPI_Comm_size’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_c_double_complex’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to ompi_mpi_uint32_t’ /usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to`

I am very troubled by this and hope to receive your help.

The second error is due to the cuSolverMP library needing to linked with MPI, so linking with our MPI driver, mpic++, instead of nvc++, might work around the issue.

As to why cuSolverMP is getting linked in, I’m not sure. You’re not explicitly adding it and none of these compiler flags implicitly include it. Possibly it’s dependency with the NPP libraries? Do you need NPP? If not, try linking without it to see if it helps.

Looks like it can’t find the NCCL library either. That’s over in the comms directory, “/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/11.8/nccl/lib/”, so you may need to add this directory via the “-L” flag to your link.

For the makelocalrc issue, are you running this command by hand or as part of the install script?

The actual error is because the “bindir” path is missing the “bin” directory so it can’t find nvaccelinfo. By default it should use the same bin directory as makelocalrc. Not sure why it’s missing.

If running by hand, try working around it by setting the bindir explicitly:

/opt/nvidia/hpc_sdk/$NVARCH/22.11/compilers/bin/makelocalrc /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/bin -x /opt/nvidia/hpc_sdk/$NVARCH/22.11
-net /usr/nvidia/shared/22.11

When I used the following command to compile, it was successful, but the following issues occurred during runtime.

mpic++ -acc -gpu=cuda11.8 -fast -cuda -cudalib=cufft -std=c++17 -Minfo=accel  -lcudart -o test test.cpp -L/home/ssy/桌面/zd118/ZD_CLASS_PROCESS -lprocess 
./test: error while loading shared libraries: libcal.so: cannot open shared object file: No such file or directory

Does MPIC++and NVC++affect my usage? I did not use the functions of MPI.
I tried to remove the NPP flag as follows:

nvc++ -acc -gpu=cuda11.8 -fast -cuda -cudalib=cufft -std=c++17 -Minfo=accel  -lcudart -o test test.cpp -L/home/ssy/桌面/zd118/ZD_CLASS_PROCESS -lprocess

The previous issues still exist.

/usr/bin/ld: warning: libmpi.so.40, needed by /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcusolverMp.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so: undefined reference to `MPI_Waitall'
...
pgacclnk: child process exit status 1: /usr/bin/ld
Missing this error message:
/usr/bin/ld: warning: libnccl.so.2, needed by /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/math_libs/11.8/lib64/libcal.so, not found (try using -rpath or -rpath-link)

I tried compiling with the following command:

nvc++ -acc -gpu=cuda11.8 -fast -cuda -cudalib=cufft -std=c++17 -Minfo=accel  -lcudart -o test test.cpp -L/home/ssy/桌面/zd118/ZD_CLASS_PROCESS -lprocess -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/11.8/nccl/lib/ -lnccl -L/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/comm_libs/mpi/lib/ -lmpi
The following issues have occurred:
./test: error while loading shared libraries: libprocess.so: cannot open shared object file: No such file or directory

This is the compilation instruction for my libprocess.so:

NVCC=nvc++
NVCC_FLAGS= -v  -acc -gpu=managed -cuda -cudalib -std=c++17 -Minfo=accel  -lcudart -fPIC -fast -Xcomplier  #-lnppig -lnppc -lnppisu -lnppidei

SOURCES =libprocess.cpp
HEADERS=libprocess.h FFTuse.h CircularBuffer.h
LIBRARY_NAME=libprocess.so

all:$(LIBRARY_NAME)

$(LIBRARY_NAME):$(SOURCES)
		$(NVCC) $(NVCC_FLAGS) -shared -o $@ $^

clean:
	rm -f $(LIBRARY_NAME)

I have tried the instructions you gave, as follows:

sudo /opt/nvidia/hpc_sdk/$NVARCH/22.11/compilers/bin/makelocalrc /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/bin -x /opt/nvidia/hpc_sdk/$NVARCH/22.11 -net /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/share_objects
/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/bin/makelocalrc: line 152: /opt/nvidia/hpc_sdk/Linux_x86_64/22.11/nvaccelinfo: No such file or directory
find: '/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/../../cuda': No such file or directory
/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/bin/makelocalrc: line 164: bundled_cuda: bad array subscript

In the installation guide document, version 22.11 requires running this command, while version 23.11 does not require running this command, which does not seem to affect the use of the NVhpc SDK.

For the shared objects not being found, set your environment’s LD_LIBRARY_PATH variable to include the directories to these libraries so the loader can find them.

For the cusolverMP dependency, my guess is that it’s coming from libprocess.so given you’re linking with “-cudalib” which will link against all CUDA libraries, including cusolverMP. You should consider only using needed libraries, i.e. -cudalib=cufft. Removing the dependency will mean no dependency on libcal.so nor the need to compile with MPI.

As to why makelocalrc is removing the “bin” which causes nvaccelinfo not being found, I’m not sure. I can’t reproduce the error and there’s nothing I can find in the script that would cause this.

Though in the mid-23 releases we did change things so the general “localrc” file isn’t needed. Instead this info is gathered the first time the compilers are run on a system and stored in the user’s home directory.