HPCSDK 22.7 Installation issues

Hello, I am trying to install HPCSDK 22.7 on a network location that I have write access to, using the ./install that when prompts me for installation directory I specify “/data/saet/mtml/software/x86_64/nvidia/hpc_sdk”

I’ve enabled GCC 10.2.x prior to installation and I am selecting “network installation”. Please see below. However, at the makelocalrc fails.

The HPC stack installs as follows

 ./install

Welcome to the NVIDIA HPC SDK Linux installer!

You are installing NVIDIA HPC SDK 2022 version 22.7 for Linux_x86_64.
Please note that all Trademarks and Marks are the properties
of their respective owners.

Press enter to continue...


A network installation will save disk space by having only one copy of the
compilers and most of the libraries for all compilers on the network, and
the main installation needs to be done once for all systems on the network.

1  Single system install
2  Network install

Please choose install option: 
2

Please specify the directory path under which the software will be installed.
The default directory is /opt/nvidia/hpc_sdk, but you may install anywhere you wish,
assuming you have permission to do so.

Installation directory? [/opt/nvidia/hpc_sdk] 
/data/saet/mtml/software/x86_64/nvidia/hpc_sdk 
Common local directory on all hosts for shared objects? [/data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/22.7/share_objects]

Note: directory /data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/22.7/share_objects was created.


Installing NVIDIA HPC SDK version 22.7 into /data/saet/mtml/software/x86_64/nvidia/hpc_sdk
Output from gcc:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) 
COLLECT_GCC_OPTIONS='-o' '/tmp/tmp.3x9FPb8FZI/a.out' '-v' '-mtune=generic' '-march=x86-64'
 /usr/libexec/gcc/x86_64-redhat-linux/4.8.5/cc1 -quiet -v /tmp/tmp.3x9FPb8FZI/hello-7542.c -quiet -dumpbase hello-7542.c -mtune=generic -march=x86-64 -auxbase hello-7542 -version -o /tmp/ccz7cjMR.s
GNU C (GCC) version 4.8.5 20150623 (Red Hat 4.8.5-44) (x86_64-redhat-linux)
	compiled by GNU C version 4.8.5 20150623 (Red Hat 4.8.5-44), GMP version 6.0.0, MPFR version 3.1.1, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/util/gcc/gcc-10.2.0/lib/gcc/x86_64-redhat-linux/4.8.5/include"
ignoring nonexistent directory "/util/gcc/gcc-10.2.0/lib/gcc/x86_64-redhat-linux/4.8.5/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../x86_64-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /data/saet/mtml/software/x86_64/RHEL7/intel/OneAPI/mkl/2022.1.0/include
 /data/saet/mtml/software/x86_64/RHEL7/intel/OneAPI/tbb/2021.6.0/include
 /usr/local/include
 /usr/include
End of search list.
GNU C (GCC) version 4.8.5 20150623 (Red Hat 4.8.5-44) (x86_64-redhat-linux)
	compiled by GNU C version 4.8.5 20150623 (Red Hat 4.8.5-44), GMP version 6.0.0, MPFR version 3.1.1, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 231b3394950636dbfe0428e88716bc73
In file included from /tmp/tmp.3x9FPb8FZI/hello-7542.c:1:0:
/usr/include/stdio.h:33:21: fatal error: stddef.h: No such file or directory
 # include <stddef.h>
                     ^
compilation terminated.
 
ERROR: Linker : not found
 ** makelocalrc step has FAILED.  Linker not found ** 
 ** See gcc output above ** 
Command used:
gcc -o /tmp/tmp.3x9FPb8FZI/a.out -v /tmp/tmp.3x9FPb8FZI/hello-7542.c
cat /tmp/tmp.3x9FPb8FZI/hello-7542.c:
#include <stdio.h>
int main()
{
printf ("Hi\n");
return 0;
}
 
Making symbolic links in /data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/2022

How can I run properly the “makelocalrc” and the “add_network_host” ? Given that my installation location is “/data/saet/mtml/software/x86_64/nvidia/hpc_sdk”, what should the -net argument be ?

Why is the makelocalrc using the system’s GCC (4.8.5) and not the one in the PATH which is 10.2.0?

thanks!
Michael

Hi Michael,

Why is the makelocalrc using the system’s GCC (4.8.5) and not the one in the PATH which is 10.2.0?

We’re actually debating this internally right now. The fix is very straight forward. Edit makelocalrc at line 30 so the system PATH is appended rather than prepended.

However, the generated “localrc” file (or “localrc.” for a network install) is supposed to be configured for use with that system. So if it were to use a different GNU for the user that installed the package, but another user tries to use the compilers and was using a different GNU, then the NVHPC compilers wouldn’t be able to find the system headers and libraries. In other words, makelocalrc is attempting to create a base configuration for this particular system. Using whatever GNU is in the PATH would solve your issue, but may cause other issues later.

Note that for a network install, makelocalrc will get invoked the first time a compiler is invoked on a new system. However, it will still use the default GNU for the configuration.

makelocalrc does have options which you can use to set the GNU version to use in the config, i.e the “-gcc”, “-gpp”, and “-g77”. You can use these to create configuration for each GNU version. This is particularly useful for sites that use modules. For example:

% makelocalrc -d . -x -gcc <path>/gcc-10.2.0/Linux_x86_64/bin/gcc -gpp <path>/gcc-10.2.0/Linux_x86_64/bin/g++ -g77 <path>/gcc-10.2.0/Linux_x86_64/bin/gfortran

Note that “-d” specifies the directory to output the file. Here I just use the current directory.

You can then copy the generated localrc to any directory and rename it. Then in the module set the environment variable “NVLOCALRC” and the compilers will use it rather than they system config.

% cp localrc /path/to/configs/localrc.gnu102
% export NVLOCALRC=/path/to/configs/localrc.gnu102

A third option is to use the compiler flag “-gcc-toolchain=<path/to/gnu/bin>” which will use this GNU install for the system headers and libraries instead of what’s in localrc. The caveat being that the user will need to add this to their compile and there is a slight bit of overhead. Not much, but it can add-up if building a particularly large application.

Unfortunately, there’s no perfect solution that can accommodate all use cases, but we are always looking for ways to improve.

-Mat

Hey, Mat,

Thanks for the in depth feedback! The NVLOCALRC envvar is quite useful.

I have another question that relates to CUDA’s runtime libraries libcudart and libcudart_static : I use nvcc out off the CUDA 11.7 installation that comes with HPDSDK 22.7 but it fails to find these two libs at link time:

Excerpt :

-- Check for working CUDA compiler: /data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/bin/nvcc - broken
CMake Error at /data/saet/mtml/software/x86_64/RHEL7/cmake-3.24.1-linux-x86_64/share/cmake-3.24/Modules/CMakeTestCUDACompiler.cmake:102 (message):
  The CUDA compiler

    "/data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/bin/nvcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /dev/shm/mtml/src/GEOSX/thirdPartyLibs/build-GPU-GCC_10.2-release/raja/src/raja-build/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_03e08/fast && gmake[3]: Entering directory `/dev/shm/mtml/src/GEOSX/thirdPartyLibs/build-GPU-GCC_10.2-release/raja/src/raja-build/CMakeFiles/CMakeTmp'
    /usr/bin/gmake  -f CMakeFiles/cmTC_03e08.dir/build.make CMakeFiles/cmTC_03e08.dir/build
    gmake[4]: Entering directory `/dev/shm/mtml/src/GEOSX/thirdPartyLibs/build-GPU-GCC_10.2-release/raja/src/raja-build/CMakeFiles/CMakeTmp'
    Building CUDA object CMakeFiles/cmTC_03e08.dir/main.cu.o
    /data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/util/gcc/gcc-10.2.0/bin/g++   -Xcompiler=-fPIE -MD -MT CMakeFiles/cmTC_03e08.dir/main.cu.o -MF CMakeFiles/cmTC_03e08.dir/main.cu.o.d -x cu -c /dev/shm/mtml/src/GEOSX/thirdPartyLibs/build-GPU-GCC_10.2-release/raja/src/raja-build/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_03e08.dir/main.cu.o
    Linking CUDA executable cmTC_03e08
    /data/saet/mtml/software/x86_64/RHEL7/cmake-3.24.1-linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/cmTC_03e08.dir/link.txt --verbose=1
    /util/gcc/gcc-10.2.0/bin/g++ CMakeFiles/cmTC_03e08.dir/main.cu.o -o cmTC_03e08  -lcudadevrt -lcudart_static -lrt -lpthread -ldl  -L"/data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/targets/x86_64-linux/lib/stubs" -L"/data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/targets/x86_64-linux/lib"
    /usr/bin/ld: cannot find -lcudadevrt
    /usr/bin/ld: cannot find -lcudart_static
    collect2: error: ld returned 1 exit status

It appears that nvcc does not know where the actual libs are to add the -L
These are actually at the “standard” CUDA location (there are other symlink paths leading to the same place) :

/data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/2022/cuda/lib64/
OR
/data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/lib64/

Thanks!
Michael

Hi Michael,

Since we can ship multiple versions of CUDA with the HPC SDK, there’s an extra directory level for the CUDA version. So you’re CMake library include directories should be:

“/data/saet/mtml/software/x86_64/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/targets/x86_64-linux/lib”

The top level “lib64” symlink points to the same spot. Not sure why CMake is configured to link to the lower level directories rather that the top level given they can change.

-Mat

Hi Mat,

Just to add my $0.02 here: I’ve been complaining about the current makelocalrc behavior for over a year now. In my experience, our HPC systems generally have an OS installed when the system is initially brought up, and that image is not updated very much over the life of the system. So, we end up with very old GCCs in /usr/bin, which no one uses! The actual programming environment is provided through modules, so the “correct” GCC is the one found on the PATH.

I’ve got an ugly script that generates a localrc file using all the -gcc etc. flags, but it would be a lot nicer to just have makelocalrc use the GCC on the path.

Paul

1 Like

Thanks for the input Paul. I’ll let management know and it may help with their decision.

1 Like