Error in trying to use stdpar in nvc++

Hi all,

I got an error in using stdpar on gpu. The same error was mentioned in the topic

But I did not get clue from it to fix my problem.

I am writing a simple test code to trying using stdpar in nvc++.

The code is simply relaxation for 2D Laplace’s equation, ie, value on each grid point the the average of the neighboring 4 grid points. Here is my code:

#include <iostream>
#include <vector>
#include <ranges>
#include <execution>
#include <fstream>
#include <algorithm>
#include <experimental/mdspan>


auto main() -> int {
int size = 50;   //  array size= size*size
int niter=100;
std::vector<double> A(size*size, 0.0);
auto A_v = std::experimental::mdspan (A.data(), size, size);   //create view

// Some initialization code
// Set one side of the rectangular grid in A to unity
    for (int i = 0; i < size; ++i) {
        A_v(i,0) = 1.0; // Set to unity on one side
    }
std::vector<double> B(A);
auto B_v = std::experimental::mdspan (B.data(), size, size);  //create view

//view for using for_each
auto v = std::ranges::views::cartesian_product(
std::ranges::views::iota(1, size - 1),
std::ranges::views::iota(1, size - 1));

//iteration
for (int i=0; i<niter; i++) {
std::for_each(std::execution::par, std::begin(v), std::end(v),
[=](auto idx) {
auto [i, j] = idx;
B_v(i,j) = 0.25*(A_v(i-1,j) + A_v(i+1,j) + A_v(i,j-1) + A_v(i,j+1));
});
std::swap(A_v,B_v);

}

    // Open file for writing
    std::ofstream outputFile("output.txt");
    if (!outputFile.is_open()) {
        std::cerr << "Error: Unable to open output file!" << std::endl;
        return 1;
    }

    // Write contents of A to file
    for (int i = 0; i < size; ++i) {
        for (int j = 0; j < size; ++j) {
            outputFile << i << " " << j << " " << A_v(i,j) << std::endl;
        }
        outputFile << std::endl; // Skip a line after each row
    }

    // Close file
    outputFile.close();

    // Confirmation message
    std::cout << "Output file 'output.txt' generated successfully." << std::endl;



}

When I compiled with nvc++ -std=c++23, I could run the exe and got reasonable result.

However, when I tried run on gpu, I compiled with nvc++ -std=c++23 -stdpar=gpu -gpu=cc89, I got runtime error:
terminate called after throwing an instance of ‘thrust::THRUST_200802_SM_89_NVHPC_NS::system::system_error’
what(): parallel_for: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
Aborted (core dumped)

I am new to nvc++ and I don’t know what I am missing, even for such a short code.
Thanks in advance for anyone can help.

Hi huccpp,

Some features such as “cartesian_product” require a system with HMM support and compiled to use CUDA Unified Memory so the GPU can directly access host memory. That’s likely what’s happening here.

I’m presuming you’re not on a system with HMM like a Grace-Hopper? If not, you might try something like the following:

#include <iostream>
#include <vector>
#include <ranges>
#include <execution>
#include <fstream>
#include <algorithm>
#include <experimental/mdspan>

int main() {
    int size = 50;   // array size = size*size
    int niter = 100;
    std::vector<double> A(size * size, 0.0);
    std::vector<double> B(A);

    auto A_v = std::experimental::mdspan(A.data(), size, size); // create view
    auto B_v = std::experimental::mdspan(B.data(), size, size); // create view

    // Set one side of the rectangular grid in A to unity
    for (int i = 0; i < size; ++i) {
        A_v(i, 0) = 1.0; // Set to unity on one side
    }

    // Parallelize using a flat index range
    auto flat_range = std::views::iota(0, (size - 2) * (size - 2));

    for (int iter = 0; iter < niter; ++iter) {
        std::for_each(std::execution::par, flat_range.begin(), flat_range.end(),
            [=](int idx) {
                int i = idx / (size - 2) + 1;
                int j = idx % (size - 2) + 1;
                B_v(i, j) = 0.25 * (A_v(i - 1, j) + A_v(i + 1, j) + A_v(i, j - 1) + A_v(i, j + 1));
            }
        );
        std::swap(A, B); // Swap underlying data
        // Recreate views to point to swapped data
        A_v = std::experimental::mdspan(A.data(), size, size);
        B_v = std::experimental::mdspan(B.data(), size, size);
    }

    // Write contents of A to file
    std::ofstream outputFile("output.txt");
    if (!outputFile.is_open()) {
        std::cerr << "Error: Unable to open output file!" << std::endl;
        return 1;
    }

    for (int i = 0; i < size; ++i) {
        for (int j = 0; j < size; ++j) {
            outputFile << i << " " << j << " " << A_v(i, j) << std::endl;
        }
        outputFile << std::endl; // Skip a line after each row
    }

    outputFile.close();
    std::cout << "Output file 'output.txt' generated successfully." << std::endl;
    return 0;
}

-Mat

Hi Mat,

Thanks a lot.

Yes, you are right that I am not on a gpu system with unified memory. I am using the gpu on my graphic card geforce rtx 4070.

The code you proposed works well. And I have one more question. Inside the iteration,

why did you swap the data and then update the views instead of simply swapping the views?

I did both and found that the total squared difference between both resulting arrays is zero,

or they will be different and only by chance in the case I tested is of zero difference?

Thanks.

huccpp

The swap only swaps the views not the underlying data. This is fine if you only access via the view, but could be a problem if you try to access the data directly.

So either way works here, but just prevents a potential future issue.

Ok, I’ve got it. Thanks Mat.