Nvc++ stdpar compilation and linking problems

I’m trying to get a very simple pstl example working with GPU acceleration:

#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <execution>
#include <algorithm>

int main(void) {
    std::vector<int> data(10000000);
    std::fill_n(std::execution::par_unseq, data.begin(), data.size(), -1);
    puts("Hello World!!!");
    return EXIT_SUCCESS;

After looking around, I was able to find the specific set of compilation and linking arguments to get this compiling:

nvc++ -fast -g -Wall -stdpar    -c -o nvidia_pstl_test.o nvidia_pstl_test.cpp
nvc++ -o nvidia_pstl_test nvidia_pstl_test.o -cuda -lcudanvhpc101

There are a couple of problems here:

  1. The resulting binary segfaults.

     user@user-linux:~/eclipse-workspace/nvidia_pstl_test$ ./nvidia_pstl_test 
     Segmentation fault (core dumped)

    I’m using a 1080Ti with proprietary NVIDIA drivers on Ubuntu 20.04. I have no idea what’s wrong, but here’s a stacktrace thanks to Eclipse:

     user@user-linux:~/eclipse-workspace/nvidia_pstl_test$ nvidia-smi
     Wed Mar 17 11:54:53 2021       
     | NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2     |
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |                               |                      |               MIG M. |
     |   0  GeForce GTX 108...  Off  | 00000000:01:00.0 Off |                  N/A |
     |  0%   34C    P8    13W / 300W |    205MiB / 11170MiB |      0%      Default |
     |                               |                      |                  N/A |
     | Processes:                                                                  |
     |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
     |        ID   ID                                                   Usage      |
     |    0   N/A  N/A       981      G   /usr/lib/xorg/Xorg                 60MiB |
     |    0   N/A  N/A      1510      G   /usr/lib/xorg/Xorg                125MiB |
     |    0   N/A  N/A      1639      G   /usr/bin/gnome-shell                9MiB |
  2. I’m not sure if I missed it but not no where in the parstd docs does it talk about the -cuda flag OR having to link again cudanvhpc101. I spent a bunch of time googling and grepping to get it to a point where it doesn’t come back with a linker error.

Okay, so it turns out the proper way to build and link with GPU pstl enabled is:

nvc++ -fast -g -Wall -stdpar    -c -o nvidia_pstl_test.o nvidia_pstl_test.cpp
nvc++ -o nvidia_pstl_test nvidia_pstl_test.o -stdpar

Just pass -stdpar in both steps.