I am trying to get started writing c++ with std::par for parallel execution. Based on the documentation i wrote the following program:
#include <vector>
#include <iostream>
#include <algorithm>
#include <execution>
using namespace std;
struct squared {
void operator()(float& x) const { x = x * x; }
};
void square_all(std::vector<float>& v) {
std::for_each(std::execution::par_unseq, v.begin(), v.end(),squared{});
}
int main(){
vector<float> v1;
for (uint32_t i=0; i<=10; i++){
v1.push_back(float(i)/2);
}
for(auto x : v1) {
cout<<x<<" ";
}
cout << "\n";
square_all(v1);
for(auto x : v1) {
cout<<x<<" ";
}
cout<< "\n";
}
It compiles fine with both clang and nvc++ without -stdpar
but when I run it with -stdpar
i get this:
$ nvc++ -stdpar for_each.cpp -o for_each && ./for_each
malloc: cuMemMallocManaged returns error code 801 for new pool allocation
new: call to cuMemAllocManaged returned error 801 (CUDA_ERROR_NOT_SUPPORTED): operation not supported
Segmentation fault (core dumped)
Not sure how to proceed from here. CUDA programs compile and run just fine with nvcc on this node.
nvidia-smi
Mon May 13 16:09:37 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 GRID A100X-20C On | 00000000:04:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 0MiB / 20480MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+