Sorting std::tuple<float, float, float> fails on the GPU due to misaligned address

I further modified my example from Lexicographic comparison of std::tuple on GPU fails with C++20 to use 3D coordinates. To avoid the previous problem I use C++17 with which the previous example worked on the GPU without a problem. Now when executing, I get

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  merge_sort: failed to synchronize: cudaErrorMisalignedAddress: misaligned address

I’m on “nvc++ 22.3-0 64-bit target on x86-64 Linux” with “gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0” on a GTX 1070. I compile with e.g.

nvc++ -O3 -std=c++17 -stdpar=gpu -gpu=cc61,cuda11.6

weld_vertices3D.cpp (3.4 KB)

Thanks paleonix,

I was able to reproduce this error irrespective of the GNU version or C++17 vs C++20. Added problem report TPR #31964 and sent to engineering for further investigation.

-Mat

1 Like