CUDA version not available message with nvc++ on Ubuntu

I have recently installed version10.7 of the HPC SDK on my 64-bit Ubuntu 20.04 system with TITAN X GPU. When I try to compile a simple C++17 test program, I get a message such as “Error-CUDA version 10.2 is not available in this installation . Please read documentation for CUDA_HOME to solve this issue”. My system has the nvidia-driver-440 and nvidia-cuda-toolkit packages installed. I note that nvcc is located in the /usr/bin directory; and reports 10.1 as the release number.

https://docs.nvidia.com/hpc-sdk/index.html
https://docs.nvidia.com/hpc-sdk/compilers/c+±parallel-algorithms/index.html
https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#cuda-toolkit-versions

I began by following the blog post “Accelerating Standard C++ with GPUs Using stdpar”; and then worked through some suggestions via the three links above. I can only ever get the message to indicate that it is another version of CUDA that it can’t find. The command I start with is:

/opt/nvidia/hpc_sdk/Linux_x86_64/20.7/compilers/bin/nvc++ -stdpar help.cpp

I began by adding -gpu=cuda10.1 and/or -gpu=cc60 (Pascal). I also tried setting CUDA_HOME to /usr/bin. Recognising that there are a few CUDA versions included with the SDK I also tried first with the environment commands below, and then tried a few other -gpu=CUDAX.Y options.

NVARCH=`uname -s`_`uname -m`; export NVARCH
NVCOMPILERS=/opt/nvidia/hpc_sdk; export NVCOMPILERS
MANPATH=$MANPATH:$NVCOMPILERS/$NVARCH/20.7/compilers/man; export MANPATH
PATH=$NVCOMPILERS/$NVARCH/20.7/compilers/bin:$PATH; export PATH

nvaccelinfo reports the CUDA Driver Version as 10020.

Here is one C++ program which can trigger the error:

#include <algorithm>
#include <iostream>
#include <vector>
#include <execution>

int main()
{
  std::vector<int> v = {5,100,3,6,6,109,64,234,656,25,7,44,6,232,2};
  std::sort(std::execution::par_unseq, v.begin(), v.end());
  std::cout << v[0] << '\n';
  return 0;
}

Any help would be much appreciated.

Paul

Hi Paul,

Which bundle of the NVIDIA SDK did you download? There’s two. One with just the latest CUDA version (11.0) and one that includes the current and past two releases of CUDA (11,0, 10.2, and 10.1). My guess is that you installed the CUDA 11.0 only bundle.

Note that you don’t have to download the larger bundle and instead use you’re own local install of CUDA by setting the environment variable to point to your CUDA installation. In your case, it sounds like you have the CUDA 10.1 SDK installed, so you’ll also need to tell the compiler to target CUDA 10.1 on the compile line (via the “-gpu=cuda10.1” flag). By default, the compiler with use the CUDA version of your driver.

To summarize some of your options, you can do one of the following:

Download and install the HPC SDK which includes the older CUDA versions.
Update your CUDA driver to CUDA 11.0
Install the CUDA 10.2 SDK and set CUDA_HOME to point to this install
Set CUDA_HOME to your CUDA 10.1 install and add the flag “-gpu=cuda10.1”.

Hope this helps,
Mat

Hi Mat,

Thanks for getting back so quickly. It was indeed the NVIDIA HPC SDK version which includes just the latest CUDA release (nvhpc_2020_207_Linux_x86_64_cuda_11.0.tar.gz). When I use the “-gpu=cuda10.1” flag, the error message then references 10.1 instead of 10.2:

$ /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/compilers/bin/nvc++ -gpu=cuda10.1 -stdpar help.cpp 
nvc++-Error-CUDA version 10.1 is not available in this installation. Please read documentation for CUDA_HOME to solve this issue

Your 4th option suggests setting CUDA_HOME. If I set this to the location of nvcc (i.e. `/usr/bin), it changes nothing. Does CUDA use the CUDA_HOME variable only to locate nvcc? (Otherwise I might be able to create another directory and use symbolic links.)

Thanks again,
Paul

No, CUDA_HOME should point to the base directory for your CUDA SDK installation, which typically is located in “/opt/cuda…” or “/usr/local/cuda…”. Though users can install CUDA in any directory they choose, so it may be elsewhere depending upon where you or your admin installed it.

We don’t use the “nvcc” driver directly but rather the underlying CUDA back-end compiler (libnvvm) and CUDA runtime libraries.

Thanks again Mat. I continued with your other suggestions. I have the latest versions of the CUDA driver and CUDA SDK packages available on Ubuntu 20.04, and so the last option which I can look at is to:

Download and install the HPC SDK which includes the older CUDA versions.

I couldn’t find information on how to uninstall the (nvhpc_2020_207_Linux_x86_64_cuda_11.0) HPC SDK (perhaps just remove the directory?), but I can of course install nvhpc_2020_207_Linux_x86_64_cuda_multi in a custom location (I chose /opt/nvidia/hpc_sdk_multi). After doing this my compilation command produces a different error message:

$ /opt/nvidia/hpc_sdk_multi/Linux_x86_64/20.7/compilers/bin/nvc++ -gpu=cuda10.1 -gpu=cc60 -stdpar ~/projects/cpp-std-parallel/help.cpp
"/opt/nvidia/hpc_sdk_multi/Linux_x86_64/20.7/compilers/include/nvhpc/stdpar_con
          fig.hpp", line 19: catastrophic error: #error directive: An
          unexpected version of Thrust which is incompatible with NVC++ is on
          the include path. NVC++ includes its own version of Thrust; no
          user-supplied version should be needed.

I then tried removing the libthrust-dev package, but this also invokes the removal of the nvidia-cuda-toolkit (and nvidia-cuda-dev). (I don’t want to do this, but I’m curious, so I do it; I can always reinstall.) After that, I add in -std=c++17 and it works; and note that the -gpu=cuda10.1 and -gpu=cc60 flags aren’t necessary (CUDA version 10.2 seems to be the default) here.

$ /opt/nvidia/hpc_sdk_multi/Linux_x86_64/20.7/compilers/bin/nvc++ -stdpar -std=c++17 help.cpp

I still need CUDA installed, so I test the versions included with the SDK. Of the three shipped with the HPC SDK, versions 10.1 and 10.2 produce the ol’ unsupported GNU version! gcc versions later than 8 are not supported! error, but version 11 works:

$ /opt/nvidia/hpc_sdk_multi/Linux_x86_64/20.7/cuda/11.0/bin/nvcc my_simple_test.cu

…but not with anything substantial … via https://github.com/NVIDIA/cuda-samples/archive/v11.0.tar.gz:

$ /opt/nvidia/hpc_sdk_multi/Linux_x86_64/20.7/cuda/11.0/bin/nvcc -I ../../Common matrixMul.cu
$ ./a.out 
[Matrix Multiply Using CUDA] - Starting...
CUDA error at ../../Common/helper_cuda.h:793 code=35(cudaErrorInsufficientDriver) "cudaGetDeviceCount(&device_count)"

This sends me back to the “gcc versions later than 8 not supported!” error. I install g++-8 and put symbolic links named gcc and g++ in each of the 3 version-numbered CUDA bin directories in the HPC SDK. 10.1 works, but so does 10.2, so I go with that newer version (11.0 fails at runtime as before):

$ ln -s /usr/bin/gcc-8 /opt/nvidia/hpc_sdk_multi/Linux_x86_64/20.7/cuda/10.2/bin/gcc
$ ln -s /usr/bin/g++-8 /opt/nvidia/hpc_sdk_multi/Linux_x86_64/20.7/cuda/10.2/bin/g++

This suits me for now. (I’m glad to stick with package managed drivers; I’ve lost too much time on NVIDIA proprietary drivers in the past.)

The only questions I would ask now are: what is the correct way to uninstall the SDK? And with CUDA 10.2 from the SDK, would CUDA_HOME be /opt/nvidia/hpc_sdk_multi/Linux_x86_64/20.7/cuda/10.2/bin?

Cheers,
Paul

what is the correct way to uninstall the SDK?

Assuming you mean the HPC SDK, then you can just delete it. i.e. “rm -rf /opt/nvidia/hpc_sdk_multi”.

And with CUDA 10.2 from the SDK, would CUDA_HOME be

I don’t set CUDA_HOME unless I’m pointing to a CUDA SDK install outside of the HPC SDK install tree so haven’t done this, but would think it would be “/opt/nvidia/hpc_sdk_multi/Linux_x86_64/20.7/cuda/10.2” (i.e. drop the bin).

-Mat

1 Like

I am struggling with a related issue. I am on CentOS 7, with CUDA 10.1, 10.2, 11 installed.
When I try to compile the code above from the OP, I get:
nvc++ cudapar.cpp -std=c++17 -O3 -stdpar -o gpu_tes -gpu=cuda10.2
#error An unexpected version of Thrust which is incompatible with NVC++ is on the include path. NVC++ includes its own version of Thrust; no user-supplied version should be needed.

My include paths, as reported by:
echo | gcc -E -Wp,-v -

/home/shared/software/cuda/hpc_sdk/Linux_x86_64/20.9/comm_libs/nvshmem/include
/home/shared/software/cuda/hpc_sdk/Linux_x86_64/20.9/comm_libs/nccl/include
/home/shared/software/cuda/hpc_sdk/Linux_x86_64/20.9/comm_libs/mpi/include
/home/shared/software/cuda/hpc_sdk/Linux_x86_64/20.9/math_libs/include
/home/shared/software/cuda/hpc_sdk/Linux_x86_64/20.9/cuda/include
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/include
/usr/local/include
/usr/include
(none of the system paths contain a directory called “thrust”)

When I use CUDA 10.2 (setting CUDA_HOME there), and pass -gpu=cuda10.2, I get the same issue. I don’t know what version of thrust might get included that causes the error…

Hmm, not sure why this is happening. Can you try adding the flag “–trace_includes” so we can see which Thrust is getting picked up? In particular, we’re looking for the "thrust/version.h” header file,

Thanks,
Mat

This one
/home/shared/software/cuda/hpc_sdk/Linux_x86_64/20.9/compilers/include/nvhpc/stdpar_config.hpp
checks for
#if THRUST_VERSION < 100910
#error An unexpected version of Thrust which is incompatible with NVC++ is on the include path. NVC++ includes its own version of Thrust; no user-supplied version should be needed.
#endif

and includes
/home/shared/software/cuda/hpc_sdk/Linux_x86_64/20.9/cuda/include/thrust/version.h
which comes through a symlink to
/home/shared/software/cuda/hpc_sdk/Linux_x86_64/20.9/cuda/11.0/include/thrust/version.h
which does
#define THRUST_VERSION 100909

In contrast to this one:
/home/shared/software/cuda/hpc_sdk/Linux_x86_64/20.9/compilers/include-stdpar/thrust/version.h
which does have
#define THRUST_VERSION 101000

If I compile with
nvc++ -I/home/shared/software/cuda/hpc_sdk/Linux_x86_64/20.9/compilers/include-stdpar vector_copy_gpu.cpp -std=c++17 -O3 -o gpu_test -stdpar

then things work!! Yey :) Not sure why it’s first checking for the wrong one.