Seeing the post about stdpar I wanted to try it out on my jetson xavier nx devkit.
That means I installed NVIDIA HPC SDK Version 20.7 with
$ wget https://developer.download.nvidia.com/hpc-sdk/nvhpc-20-7_20.7_arm64.deb \
https://developer.download.nvidia.com/hpc-sdk/nvhpc-2020_20.7_arm64.deb
$ sudo dpkg -i ./nvhpc-20-7_20.7_arm64.deb ./nvhpc-2020_20.7_arm64.deb
this means I ended up with my regular system installation of cuda 10.2 in /usr/local/. The fresh hpc sdk installation is in /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin which appears to be
nvc++ 20.7-0 linuxarm64 target on aarch64 Linux
I also see that these instructions add another cuda installation in /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/cuda/11.0
$ /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/cuda/11.0/bin/nvcc --version 22:08:06
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:42_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0
To use the HPC SDK I then set up my environment with
export PATH=$PATH:/opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/lib
Note, there is no mention of the cuda 10.2 installation (/usr/local) in either of these two variables.
When I now try to compile a simple test program (still some traces of the longer version from which i extracted it)
#include <boost/random/taus88.hpp>
#include <execution>
#include <limits>
#include <numeric>
#include <type_traits>
#include <vector>
#include <iostream>
boost::random::taus88 random_gen;
int main() {
std::vector<float> store(256, 0);
for (std::size_t i = 0; i < 256; ++i) {
store[i] = random_gen();
}
using std::min;
auto retval = std::reduce(std::execution::par, store.begin(),
store.end(),
std::numeric_limits<typename std::decay_t<decltype(store)>::value_type>::max(),
[](auto a, auto b) { return min(a, b); });
std::cout << retval << '\n';
return 0;
}
the compilation fails as follows
$ nvc++ test.cc -o test -std=c++17 -stdpar
nvc++-Error-CUDA version 10.2 is not available in this installation. Please read documentation for CUDA_HOME to solve this issue
Since the hpc sdk came with cuda 11.0 directories and with some searching on the forum, I tried to manually set cuda to 11
$ nvc++ test.cc -o test -std=c++17 -stdpar -gpu=cuda11.0
the compilation succeeds but I can’t run the executable
$ ./test 22:21:29
terminate called after throwing an instance of 'thrust::system::system_error'
what(): after reduction step 1: cudaErrorInvalidDevice: invalid device ordinal
zsh: abort (core dumped) ./test
I suspect there are multiple points where the problem could be (installation, environment setup, compilation options, …). Any pointers where to start reading / what to try / …?
Thanks in advance,
Paul