Nvhpc Fortran code linking problem

Hi everyone,

I’m dealing with the compilation of a post processing code written in fortran 90 (MPI with OpenACC acceleration) with mpif90 under nvhpc-23.1 on the CINECA Italian cluster named “Leonardo”. Actually, when I compile the code with:

mpif90 -acc=gpu -target=gpu -gpu=cc80 -o post_code_name “objects list” -fortranlibs -cudalib -acclibs

I obtain the following:

/usr/bin/ld: impossibile trovare -lcusolverMp
/usr/bin/ld: impossibile trovare -lcal
/usr/bin/ld: impossibile trovare -lcutensor
/usr/bin/ld: impossibile trovare -lcutensorMg
/usr/bin/ld: impossibile trovare -lnccl
/usr/bin/ld: impossibile trovare -lnvshmem_device
/usr/bin/ld: impossibile trovare -lnvshmem_host
pgacclnk: child process exit status 1: /usr/bin/ld

For the sake of clarity the module I loaded in the Leonardo cluster are:

1) profile/base   
2) nvhpc/23.1   
3) zlib/1.2.13--gcc--11.3.0   
4) openmpi/4.1.4--nvhpc--23.1-cuda-11.8   
5) cuda/11.8

However, when I compile the same code with my local cluster under nvhpc-24.1 I successfully obtain the final code without error. Unfortunately, on the Leonardo cluster nvhpc-24.1 is not installed.

Again, for the sake of clarity the module I loaded in my local cluster are:

1) base   
2) gcc/gcc(default)   
3) pgi/pgi_20.4   
4) intel/intel20(default)   
5) lsf10.1   
6) pfs   
7) nvhpc-22.3

Any suggestion about this little issue? Thanks in advance,

-Matteo

These are the CUDA libraries that get added when using the “-cudalib” flag.

The exact path to library changes dependent upon the CUDA version in used, either set via “-gpu=cudaXX.yy”, or implicitly set by which CUDA driver is installed. If the node doesn’t have a CUDA driver installed, you may need to explicitly set the CUDA version.

If you run the command:

nvfortran -dryrun -cudalib x.o

This will show you the link line, including all the rpaths to the library directories. For example, I see " -rpath /proj/nv/Linux_x86_64/23.1/comm_libs/12.0/nccl/lib" which is the path to the NCCL libraries for CUDA 12.0.

Double check that the correct CUDA version is being used in the library path and that the libraries actually exist in those directories given it’s possible that they were removed.

Though, the most likely solution is to add “-gpu=cuda11.8” to the link line since I presume you’re on a head node without a CUDA driver.

You are completely right, the head node is without CUDA driver. The -gpu=cuda11.8 completely solved the issue!!!

God save @MatColgrove !!!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.