Lexicographic comparison of std::tuple on GPU fails with C++20

I modified the weld_vertices.cu Thrust example to use only STL functionality. The example works with C++17 and fails for C++20 when working on the GPU (-std=c++20 -stdpar=multicore and -std=c++17 -stdpar=gpu both work).

The code uses the default lexicographic ordering of std::tuple<float, float> which is implemented in C++20 using the new “spaceship operator” (three way comparison operator<=>). Therefore my hypothesis is that there is a bug with the spaceship operator being used in GPU code. Both the std::sort and std::lower_bound (inside parallel std::transform) fail with std::execution::par_unseq due to this when not passing a custom comparison. std::unique (needs only equality, not ordering) seems to work.

I’m on “nvc++ 22.3-0 64-bit target on x86-64 Linux” with “gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0” on a GTX 1070. I compile with e.g.

nvc++ -O3 -std=c++20 -stdpar=gpu -gpu=cc61,cuda11.6

weld_vertices.cpp (3.3 KB)

Hi paleonix,

What’s the error you’re seeing? I tried on my system but see no difference between the output with C++17 vs C++20.

% nvc++ -O3 -std=c++17 -stdpar=gpu -gpu=cuda11.6 weld_vertices.cpp -V22.3
% a.out > c17.log
% nvc++ -O3 -std=c++20 -stdpar=gpu -gpu=cuda11.6 weld_vertices.cpp -V22.3
% a.out > c20.log
% diff c17.log c20.log
%

Also, which GNU version do you have installed? I have GNU 9.3, so wondering if it’s a difference in the underlying STL.

I’m also using an A100 (cc80), but don’t see how that would causes a difference.

-Mat

Sorry, just noticed you noted the use of GNU 11.2. Let me give that a try and see if my results change.

With C++20 I get

Before sort                                                                                                                                                                                                          
 vertices[0] = (0,0)                                                                                                                                                                                                 
 vertices[1] = (1,0)                                                                                                                                                                                                 
 vertices[2] = (0,1)                                                                                                                                                                                                 
 vertices[3] = (1,0)                                                                                                                                                                                                 
 vertices[4] = (0,1)                                                                                                                                                                                                 
 vertices[5] = (1,1)                                                                                                                                                                                                 
 vertices[6] = (1,1)                                                                                                                                                                                                 
 vertices[7] = (1,0)                                                                                                                                                                                                 
 vertices[8] = (2,0)                                                                                                                                                                                                 
                                                                                                                                                                                                                     
After sort                                                                                                                                                                                                           
 vertices[0] = (0,0)                                                                                                                                                                                                 
 vertices[1] = (1,0)                                                                                                                                                                                                 
 vertices[2] = (0,1)                                                                                                                                                                                                 
 vertices[3] = (1,0)                                                                                                                                                                                                 
 vertices[4] = (0,1)
 vertices[5] = (1,1)
 vertices[6] = (1,1)
 vertices[7] = (1,0)
 vertices[8] = (2,0)

After erasing duplicates
 vertices[0] = (0,0)
 vertices[1] = (1,0)
 vertices[2] = (0,1)
 vertices[3] = (1,0)
 vertices[4] = (0,1)
 vertices[5] = (1,1)
 vertices[6] = (1,0)
 vertices[7] = (2,0)

Output Representation
 indices[0] = 0
 indices[1] = 0
 indices[2] = 0
 indices[3] = 0
 indices[4] = 0
 indices[5] = 0
 indices[6] = 0
 indices[7] = 0
 indices[8] = 0

I.e. the std::sort does nothing and the std::lower_bound always gives zero (as the data isn’t sorted I would expect wrong results but not these).
With GNU 9.3 you shouldn’t have C++20 support at all, right?

Yes, moving to a 11.2 install, I now see the differences. I’ll investigate a bit further, but will most likely need to pass this on to engineering for investigation.

1 Like

I added a problem report, TPR #31963, and sent it to engineering for investigation.

Thanks,
Mat

1 Like