Can't run simple std::par program

kmeagher · May 13, 2024, 4:11pm

I am trying to get started writing c++ with std::par for parallel execution. Based on the documentation i wrote the following program:

#include <vector>
#include <iostream>
#include <algorithm>
#include <execution>

using namespace std;

struct squared {
  void operator()(float& x) const { x = x * x; }
};

void square_all(std::vector<float>& v) {
  std::for_each(std::execution::par_unseq, v.begin(), v.end(),squared{});
}

int main(){
    vector<float> v1;
    for (uint32_t i=0; i<=10; i++){
        v1.push_back(float(i)/2);
    }
    for(auto x : v1) {
    	cout<<x<<" ";
    }
    cout << "\n";
    square_all(v1);
    for(auto x : v1) {
    	cout<<x<<" ";
    }
    cout<< "\n";
}

It compiles fine with both clang and nvc++ without -stdpar but when I run it with -stdpar i get this:

$ nvc++ -stdpar for_each.cpp -o for_each && ./for_each 
malloc: cuMemMallocManaged returns error code 801 for new pool allocation
new: call to cuMemAllocManaged returned error 801 (CUDA_ERROR_NOT_SUPPORTED): operation not supported
Segmentation fault (core dumped)

Not sure how to proceed from here. CUDA programs compile and run just fine with nvcc on this node.

nvidia-smi 
Mon May 13 16:09:37 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  GRID A100X-20C                 On  | 00000000:04:00.0 Off |                    0 |
| N/A   N/A    P0              N/A /  N/A |      0MiB / 20480MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

MatColgrove · May 13, 2024, 4:35pm

It appears you’re using a virtual device which disables CUDA Unified Memory by default.

STDPAR relies on using UM in order to handle the implicit data management between the host and device copies of the vector.

If you can, try enabling UM per the instructions found in section 6.4 of https://docs.nvidia.com/grid/latest/pdf/grid-vgpu-user-guide.pdf

kmeagher · May 13, 2024, 10:00pm

Thanks, Ill talk to the admin

system · May 27, 2024, 10:01pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nvc++ stdpar compilation and linking problems nvc, nvc++ and nvfortran	2	717	October 12, 2021
Nvc++ PSTL: scan failed to synchronize: cudaErrorIllegalAddress nvc, nvc++ and nvfortran	4	810	March 22, 2021
Nvc++ seems to ignore std::execution::par if --stdpar is not specified nvc, nvc++ and nvfortran	0	345	July 13, 2020
Nested stdpar algorithm calls nvc, nvc++ and nvfortran	4	247	March 18, 2024
Compiling a stdpar shared library using an object file nvc, nvc++ and nvfortran	2	23	April 11, 2025
Stdpar runtime crash related to stack memory nvc, nvc++ and nvfortran	2	258	January 24, 2024
Std::transform_reduce incompatible with nvc++ -stdpar=gpu nvc, nvc++ and nvfortran algorithm	1	543	December 1, 2022
Nvc++ -stdpar functionality possible without single compilation unit? host linker? nvc, nvc++ and nvfortran	4	746	December 30, 2022
Out of Memory nvc, nvc++ and nvfortran	6	696	October 13, 2023
Does StdPar speed up native loops? nvc, nvc++ and nvfortran	4	569	May 3, 2023

Can't run simple std::par program

Related topics