Unable to compile on login node

Hi all!

I am trying to use latest version of HPC-SDK.
Unfortunately I am unable to compile code on login node:
nvc-Error-Unsupported processor

The only way it is working, if only I request a node with GPUs installed.
Is there any way to overcome this issue?

What’s the processor on the login node?

To cross-compile for a different CPU use the “-tp” flag. Run “nvc -help -tp” to see the list of supported processors, then set “-tp=” replacing “processor” with the target system’s processor.

To cross-compile for the GPU, use the “-gpu=cudaXX,ccNN”, where “cudaXX” is the closest CUDA version to that of the CUDA driver you’re targeting, and “ccNN” is the compute capability of the target device.

Passing target processor with -tp flag resulted in same error.

Result of lscpu command:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    1
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            15
Model:                 6
Model name:            Common KVM processor
Stepping:              1
CPU MHz:               2594.190
BogoMIPS:              5188.38
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
L3 cache:              16384K
NUMA node0 CPU(s):     0-15
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm

It looks like this system might be a bit too old. While the processor name seems to have been overridden by the VM, it looks like Intel Family 15 Model 6 is a Cedar Mill Pentium 4 chip.

We require system to support the AVX instruction set, so Sandybridge or newer.

So, there is no way to overcome this issue?
Back in the day I was using version 21.2 and it still works fine (compiles on this processor).
And I was hopping to check out cuFFTMp library

So, there is no way to overcome this issue?

No sorry. We made this change a few years and issue is with how our runtime is built. We were missing a lot of performance by having to support pre-2011 processors.

I should mention that you can download older releases at: NVIDIA HPC SDK Releases | NVIDIA Developer if you need to revert back to the 21.2 release.

Is there any way I can try cuFFTMp library with this old version of HPC SDK?

Possibly but it’s not something I’ve tried. Though if you can, you’re better off compiling with our latest NVHPC release on one of the compute nodes.

The first problem would be that cufftMP shipped with CUDA 12.3 and later, so you’d need install the new CUDA SDK and then set the environment variable “NVHPC_CUDA_HOME” to point to this install so the compilers know to use it. Since the latest CUDA version that shipped with 22.1 was CUDA 11.2, you would be crossing a major CUDA release which isn’t supported, but there’s a good chance it will be ok. I just can be sure.

Next, the 22.1 cuFFTxt module won’t include any of the cufftMP interfaces, so you’d need to write them yourself. Though you can find them documented HERE.

Finally given the “-cudalib” helper flag wont be able to find the CUDA libraries, you’ll need to manually link against the libraries.

Thank you, Mat

I have another, unrelated to current topic question.
How can I link libnvcpumath.so and other SDK libraries statically?
I am having a very strange issue.
When I run MPI program on a single node, with number of processes that match number of GPUs, everything works great. But when I try running on two nodes, suddenly an error appears:

/hpc_sdk/Linux_x86_64/21.7/comm_libs/openmpi4/openmpi-4.0.5/bin/.bin/orted: error while loading shared libraries: libnvcpumath.so: cannot open shared object file: No such file or directory

I have tried passing -static-nvidia but it did not help, dynamic linking is still used.

Passing -static results in following error:

/usr/bin/ld: attempted static link of dynamic object `/hpc_sdk/Linux_x86_64/21.7/comm_libs/openmpi4/openmpi-4.0.5/lib/libmpi_usempif08.so'

The message here seems to indicate that the issue is with the OpenMPI Open RTE Daemon, “orted”, which gets invoked with mpirun.

In other words, it doesn’t really matter whether your binary in being statically linked or not since the dependency seems to be coming from the OpenMPI tool chain.

Now what I don’t quite understand is why orted would have this dependency given it’s not a dynamic executable. So there may be some missing information or something else going on.

Though you should consider having the compiler runtime available on all the nodes. Either via a shared NFS mount or you can copy the contents of the “/hpc_sdk/Linux_x86_64/21.7/REDIST” to the nodes or package them as part of your program’s binary package. Then set the environment variable LD_LIBRARY_PATH to include the path to the “compilers/lib/” directory so the loader can find them.

Thank you, Mat

I have successfully resolved latest issue with shared libraries not loading. My environment variables were set inside .bash_profile, not .bashrc.

Original issue remains unresolved. It is a linking nightmare.
Turns out, I am unable to compile on compute node, since libatomic.so not found there. And I am unable to link everything on login node, since it requires libcuda.so which is only available on compute nodes…

I am unable to compile on compute node, since libatomic.so not found there.

“libatomic” come with the OS but possibly your sys admins didn’t install it by default. I believe it’s a dependency for the GCC compiler’s development packages. Error while loading shared libraries: libatomic.so.1: - #2 by MatColgrove

You’ll have other issues as well, not just with our compilers but also GNU, so you may want to talk to your sys admins. Granted they may not have install the development packages on purpose thus preventing the compute nodes from being used for development.

And I am unable to link everything on login node, since it requires libcuda.so which is only available on compute nodes…

Typically a program doesn’t need to link against the CUDA driver and it’s not something we’d implicitly add to the link line. Though if you do need it, there’s a stub library you can try linking against:

/install/path/Linux_x86_64/21.7/cuda/11.0/targets/x86_64-linux/lib/stubs/libcuda.so

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.