Cmake cannot find CUDA

I’m trying to build a molecular dynamics package known as LAMMPS (https://lammps.sandia.gov/) on a computer with a Ryzen 7 8-core CPU and a GT 710 video card, with the HPC_SDK package installed.

The cmake build process can’t find the relevant CUDA files.

I have tried to follow the directions of the newest cmake “find CUDA” page, but I cannot find any combination of environment variables that enables cmake to find CUDA.

Put another way - there are header files here:
/opt/nvidia/hpc_sdk/Linux_x86_64/cuda/10.2/include
and here:
/opt/nvidia/hpc_sdk/Linux_x86_64/20.5/compilers/include

What’s the difference between these?

I guess I’m not quite understanding the basis for this questions, but the first are the include files specific to the CUDA 10.2 release while the second is the location of the include compiler include files.

The environment variable CUDA_HOME is the base location for you’re CUDA installation. This could be installed using the stand alone CUDA SDK, which is often install in “/usr/local/cuda” or “/opt/cuda” but really could be installed anywhere the user wishes.

As part of the HPC SDK, the CUDA components are bundled as well, but mostly for convenience of users. You certainly can still download the CUDA SDK separately if you wish.

I think ultimately what you’re asking is how to have the LAMMPS build use the CUDA install that ships with the HPC SDK? Instruction are found here: https://lammps.sandia.gov/doc/Build_extras.html Looks like you need to set CUDA_HOME to the directory of the CUDA version you wish to use, either the those that are installed with the HPC SDK or via the CUDA SDK.

Let me put this another way. Never mind LAMMPS. cmake has announced that CUDA is one of the packages that it can ‘find’ through a typical build process:

https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#module:FindCUDAToolkit

so there is NORMALLY no need to specify any paths. I did specify a path to nvcc, per the stated search behavior.

However, when I run the initial cmake process,

in which it DID succeed in finding OpenMP and MPI,

and with CUDA specified as an included package, I get the following:

CMake Error at /usr/local/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:165 (message):
Could NOT find CUDA (missing: CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (found
version "10.2”)

I conclude from this that the HPC_SDK CUDA distribution is non-standard; otherwise, cmake would find it.

I ‘installed’ the HPC_SDK package by running the ‘install’ command that came with it, and it put the various files along two trees starting with /opt/nvidia/hpc_sdk.

I’ve also spent a lot of time with https://lammps.sandia.gov/doc/Build_extras.html. Setting variables like CUDA_HOME are intended for using ‘make’ rather than ‘cmake’, which is not normally supposed to need such settings. I tried them anyway but that did not fix anything.

The problem is that ANY location such as /usr/local/cuda cannot refer to both of these trees simultaneously. Is one set of include files not necessary for building a package with cmake?

Also, the SDK has a combination of options:

  • 2020? 20.5?
  • 10.1? 10.2? 11.0?

In an earlier thread it sounded like I need to match one of these with the version number of the driver, but the SDK documentation is silent.

I also note that nvc++ is located ONLY here, with no other versions anywhere else:

/opt/nvidia/hpc_sdk/Linux_x86_64/20.5/compilers/bin/nvc++

I’m rather confused at this point.

Given users are free to install the CUDA SDK in any base location they like, my assumption is that the locations of a CUDA installation cmake uses is configurable.

I did find the following page: https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html. Looks like you may need to set " `-DCUDAToolkit_ROOT=/path/to/cuda/installation", when using installations other than “/usr/local/cuda”. Though not being an expert on cmake myself, questions on using cmake are probably best addressed by Kitware (the makers of cmake).

The problem is that ANY location such as /usr/local/cuda cannot refer to both of these trees simultaneously.

Again, I’m not an expert in cmake, but my assumption would be that CUDA_INCLUDE_DIRS and CUDA_CUDART_LIBRARY refer to directories under the root CUDA installation. So fixing where cmake finds the CUDA install’s root directory, may allow these directories to be found as well.

Also, the SDK has a combination of options:

  • 2020? 20.5?
  • 10.1? 10.2? 11.0?

“20.5” is the HPC Compiler (formerly PGI) installation for the 20.5 release. Additional releases can be co-installed, so if you install the up coming 20.7 release, it would be installed next to 20.5 but not overwrite it.

“2020” is a common directory for packages, such as OpenMPI or NetCDF, shared by all HPC Compilers released in 2020. No need to reinstall them if you install a new version of the compilers.

“10.1”, “10.2”, and “11.0” are the various CUDA installations packaged with the compilers for convenience. There’s no need to install them separately, but you certainly can configure the compilers to use you’re own CUDA SDK installation.

In an earlier thread it sounded like I need to match one of these with the version number of the driver, but the SDK documentation is silent.

Details about the CUDA installations and configuration can be found in the HPC Compiler’s User’s Guide: https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#cuda-toolkit-versions

As noted in the documentation, the HPC Compilers do check the CUDA Driver version on the system to set the default CUDA version to use, but this is easily overridden via command line options or setting environment variables such as CUDA_HOME.

I also note that nvc++ is located ONLY here, with no other versions anywhere else:

/opt/nvidia/hpc_sdk/Linux_x86_64/20.5/compilers/bin/nvc++

I’m rather confused at this point.

nvc++ is the HPC C++ compiler and does not currently support compiling CUDA C programs. For CUDA C, you need to use the nvcc compiler. The HPC Compiler bin directory does include a nvcc compiler, but it’s actually just a wrapper which invokes the correct nvcc from the various CUDA co-installs depending on which version is being used.

The HPC SDK is a bundle of various NVIDIA and third-party products: the HPC Compilers, multiple versions of CUDA, profilers, CUDA enabled math libraries, and builds of third-party libraries. It’s not a single product. Perhaps that’s where the confusion is?

MatColgrove Moderator
July 29

xtz465:

nvc++ is the HPC C++ compiler and does not currently support compiling CUDA C programs. For CUDA C, you need to use the nvcc compiler. The HPC Compiler bin directory does include a nvcc compiler, but it’s actually just a wrapper which invokes the correct nvcc from the various CUDA co-installs depending on which version is being used.

The HPC SDK is a bundle of various NVIDIA and third-party products: the HPC Compilers, multiple versions of CUDA, profilers, CUDA enabled math libraries, and builds of third-party libraries. It’s not a single product. Perhaps that’s where the confusion is?

These last two paragraphs clarified the situation for me: there is a CUDA compiler and a separate HPC compiler. LAMMPS expects to find the CUDA compiler and its auxiliary files. When I set all the environment parameters (including $PATH) to point to the 10.2 CUDA branch, LAMMPS built without errors.

Thanks for your help on this.