Hi,
I have simple code where I copy the content from an array of std::array<float, 3> to another in a parallel for_each loop.
#include <execution>
#include <algorithm>
#include<array>
#include <vector>
using real_t = float;
using Vector3D = std::array<real_t, 3>;
int main() {
Vector3D* v1 = new Vector3D[10];
Vector3D* v2 = new Vector3D[10];
std::vector<size_t> indices{0, 1, 2};
// copy the contents of v1 to v2 for the indices in indices
std::for_each(std::execution::par_unseq, indices.begin(), indices.end(), [=](size_t i) {
v2[i] = v1[i];
});
}
It compiles but I get the following error when I run the code (I am using nvc++ 22.11):
terminate called after throwing an instance of ‘thrust::system::system_error’
what(): for_each: failed to synchronize: cudaErrorMisalignedAddress: misaligned address
Aborted (core dumped)
I specify that the code runs fine if I use an std::array of double (std::array<double, 3>). The only way I found to make this code work with floats is to use a temporary variable instead of directly copying the element from v1 to v2:
#include <execution>
#include <algorithm>
#include<array>
#include <vector>
using real_t = float;
using Vector3D = std::array<real_t, 3>;
int main() {
Vector3D* v1 = new Vector3D[10];
Vector3D* v2 = new Vector3D[10];
std::vector<size_t> indices{0, 1, 2};
// copy the contents of v1 to v2 for the indices in indices
std::for_each(std::execution::par_unseq, indices.begin(), indices.end(), [=](size_t i) {
const auto element = v1[i];
v2[i] = element;
});
}
Is there something I have misunderstood?
Regards,
Raf