Using nvc++'s -stdpar on jetson xavier nx

Seeing the post about stdpar I wanted to try it out on my jetson xavier nx devkit.

That means I installed NVIDIA HPC SDK Version 20.7 with

$ wget \
$ sudo dpkg -i ./nvhpc-20-7_20.7_arm64.deb ./nvhpc-2020_20.7_arm64.deb

this means I ended up with my regular system installation of cuda 10.2 in /usr/local/. The fresh hpc sdk installation is in /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin which appears to be

nvc++ 20.7-0 linuxarm64 target on aarch64 Linux

I also see that these instructions add another cuda installation in /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/cuda/11.0

$ /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/cuda/11.0/bin/nvcc --version          22:08:06
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:42_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0

To use the HPC SDK I then set up my environment with

export PATH=$PATH:/opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/lib

Note, there is no mention of the cuda 10.2 installation (/usr/local) in either of these two variables.

When I now try to compile a simple test program (still some traces of the longer version from which i extracted it)

#include <boost/random/taus88.hpp>
#include <execution>
#include <limits>
#include <numeric>
#include <type_traits>
#include <vector>
#include <iostream>

boost::random::taus88 random_gen;

int main() {
  std::vector<float> store(256, 0);
  for (std::size_t i = 0; i < 256; ++i) {
    store[i] = random_gen();
  using std::min;
  auto retval = std::reduce(std::execution::par, store.begin(),
      std::numeric_limits<typename std::decay_t<decltype(store)>::value_type>::max(),
      [](auto a, auto b) { return min(a, b); });

  std::cout << retval << '\n';
  return 0;

the compilation fails as follows

$ nvc++ -o test -std=c++17 -stdpar
nvc++-Error-CUDA version 10.2 is not available in this installation. Please read documentation for CUDA_HOME to solve this issue

Since the hpc sdk came with cuda 11.0 directories and with some searching on the forum, I tried to manually set cuda to 11

$ nvc++ -o test -std=c++17 -stdpar -gpu=cuda11.0

the compilation succeeds but I can’t run the executable

$ ./test                                         22:21:29
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  after reduction step 1: cudaErrorInvalidDevice: invalid device ordinal
zsh: abort (core dumped)  ./test

I suspect there are multiple points where the problem could be (installation, environment setup, compilation options, …). Any pointers where to start reading / what to try / …?

Thanks in advance,

Hi Paul,

I just tested your code on an Xavier system here, which also has a CUDA 10.2 driver, and it ran fine. Though I can replicate your error if I compile the code to target CUDA 11. The problem being that a CUDA 11 built binary is not backwards compatible so can’t be run on a system with a CUDA 10.2 driver.

You should be able to solve this by either updating the system to a CUDA 11 driver, or configure nvc++ to use a CUDA 10.2 installation.

Note that the NVIDIA HPC SDK has two different downloads. One with just the latest CUDA version (11.0), which looks to be what you downloaded, and a second that also includes the two previous CUDA versions (10.2, 10.1). Given the extra size of the download package, we didn’t want folks who only needed CUDA 11 to have to download older versions. Though in your case, you should consider downloading the larger package.

Alternately, you can set the environment variable CUDA_HOME to the base directory of your CUDA 10.2 install. In that case, nvc++ will use this install. Here’s a link to the docs that the error message above is referring:

Best Regards,

Hi Mat,

thanks for your explanation.

Somehow I’m not successful by running

$ export CUDA_HOME=/usr/local/cuda-10.2
$ nvc++ -o test -std=c++17 -stdpar
nvc++-Error-CUDA version 10.2 is not available in this installation. Please read documentation for CUDA_HOME to solve this issue
$ which nvc++

however what I now tried is adding a symlink:

$ sudo ln -s /usr/local/cuda-10.2 /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/cuda/10.2

Compilation succeeds. So I was poking a bit more about how I misunderstood the usage of CUDA_HOME and discovered that the following also works for me

nvc++ CUDA_HOME=/usr/local/cuda-10.2 -o test -std=c++17 -stdpar

(i.e. the assignment of CUDA_HOME done as argument to nvc++ instead of environment variable.)

Either way I am now able to compile and run nvc++ compiled code.


It should work as an environment variable so I filed a problem report, TPR #28983. It appears to not recognize CUDA_HOME as an env variable but only with “-stdpar” on ARM. It works as expected with “-acc” on ARM and “-stdpar” is fine on x86.

Glad you we able to get it to work by adding CUDA_HOME on the command line.

Hi Paul,

I should have mentioned before that technically we don’t support CUDA 10.2 on ARM. The minimum CUDA version is 11.0, or driver 450. See:

It may work, but since 20.7 was the first release to support ARM, we didn’t go back and validate on older CUDA versions.


Hi Mat,

okay, I’ll look into upgrading cuda then.