Compiler issues when migrating from 20.7 to 22.1

Hi,

I upgraded the NV SDK from 20.7 to 22.1 and get some compile issues I cannot find the cause of.
One is the following:
/bin/ld: cannot find -lcuda
pgacclnk: child process exit status 1: /bin/ld
The other is:
pgc+±Fatal-/nvhpc_sdk/Linux_x86_64/22.1/compilers/bin/tools/cpp2 TERMINATED by signal 11
Arguments to nvhpc_sdk/Linux_x86_64/22.1/compilers/bin/tools/cpp2

Especially the second error message doesn’t tell me much. Are there some caveats one has to observe when migrating to newer nvc++ versions?

Hi Rob_v8,

/bin/ld: cannot find -lcuda
pgacclnk: child process exit status 1: /bin/ld

“libcuda.so” is the CUDA driver runtime library but it’s not something we link against directly, rather it’s dynamically loaded at runtime. Is your build explicitly adding “-lcuda” to the link?

Can you post an example of your link line?

pgc+±Fatal-/nvhpc_sdk/Linux_x86_64/22.1/compilers/bin/tools/cpp2 TERMINATED by signal 11
Arguments to nvhpc_sdk/Linux_x86_64/22.1/compilers/bin/tools/cpp2

The back-end C++ compiler is seg faulting for some reason. Can you please provide a minimal reproducing example to shows the error?

Thanks,
Mat

Hi Mat,

the link option for the “-lcuda” error is the following:

“/nvhpc_sdk/Linux_x86_64/22.1/compilers/bin/pgc++” -Wl,-R -Wl,-rpath-link -Bstatic_pgi -Wl,-Bstatic -lscl_th-O -lBLT -ltk8.6 -ltcl8.6 -larprec -lqd -larpack -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -Wl,-Bdynamic -lgomp -lrt -lgfortran -ldl -lmpi -lmpicxx -Wl,–end-group -s -ta=tesla:cc35,cc60,cc70,cc75,cc80,cuda11.0,maxregcount:48 -Mcudalib=cusolver,cublas,cusparse -fpic LDDIR=/depot/binutils-2.33.1_gold_beta/bin -mp -tp=px -lstdc++ -lgcc -lgcc_s -lpthread -lm -lc -Wl,–strip-all

For the second error, its a large project so distilling a minimal example to reproduce the error may be rather difficult.

Regards,
Rob

Looks like it’s the cuSPARSE library that’s needing the CUDA driver runtime library. Do you have a CUDA driver installed on this system?

For the second error, its a large project so distilling a minimal example to reproduce the error may be rather difficult.

Would you be able to make the source file that’s causing the issue as well as dependent header files available?

Hi Mat,

yes when the path to the CUDA lib is added the first error is fixed. Why does the pgc++ linker not automatically add it? Is this supposed to be added by the user?
The second error I am still investigating, seems to be a problem occurring when using openacc and openmp in the same code, but I still don’t have a code I can send for reproduction.

Regards,
Rob

libucda.so typically gets installed in a system lib directory such as “/usr/lib64” or “/usr/lib/gcc/x86_64-linux-gnu/” which the linker includes by default. Is yours installed in a non-default directory?

Hi Mat,

Yes on the machine compiling the code, CUDA is not installed. Thanks for the help!
For the other problem I found that the issue comes from static variables in openmp reduction clause, when I change the variables to non-static the code compiles fine. I didn’t find any limitation about static variables in the Openmp manual, could this be an issue from the pgc++ compiler? I tried with NVSDK 21.11 and 22.1.

Regards,
Rob

could this be an issue from the pgc++ compiler?

Static variables should be legal in a reduction, but even if they weren’t the compiler should be giving an error, not segv. It’s definitely a compiler issue.

I tried a simple example using a static int in a reduction and it seemed to work fine, so there’s something more to it. If you can get us a reproducing example, I can then report it and have engineering get it fixed.