Can't run simple std::par program

I am trying to get started writing c++ with std::par for parallel execution. Based on the documentation i wrote the following program:

#include <vector>
#include <iostream>
#include <algorithm>
#include <execution>

using namespace std;

struct squared {
  void operator()(float& x) const { x = x * x; }
};

void square_all(std::vector<float>& v) {
  std::for_each(std::execution::par_unseq, v.begin(), v.end(),squared{});
}

int main(){
    vector<float> v1;
    for (uint32_t i=0; i<=10; i++){
        v1.push_back(float(i)/2);
    }
    for(auto x : v1) {
    	cout<<x<<" ";
    }
    cout << "\n";
    square_all(v1);
    for(auto x : v1) {
    	cout<<x<<" ";
    }
    cout<< "\n";
}

It compiles fine with both clang and nvc++ without -stdpar but when I run it with -stdpar i get this:

$ nvc++ -stdpar for_each.cpp -o for_each && ./for_each 
malloc: cuMemMallocManaged returns error code 801 for new pool allocation
new: call to cuMemAllocManaged returned error 801 (CUDA_ERROR_NOT_SUPPORTED): operation not supported
Segmentation fault (core dumped)

Not sure how to proceed from here. CUDA programs compile and run just fine with nvcc on this node.

nvidia-smi 
Mon May 13 16:09:37 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  GRID A100X-20C                 On  | 00000000:04:00.0 Off |                    0 |
| N/A   N/A    P0              N/A /  N/A |      0MiB / 20480MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

It appears you’re using a virtual device which disables CUDA Unified Memory by default.

STDPAR relies on using UM in order to handle the implicit data management between the host and device copies of the vector.

If you can, try enabling UM per the instructions found in section 6.4 of https://docs.nvidia.com/grid/latest/pdf/grid-vgpu-user-guide.pdf

Thanks, Ill talk to the admin