Including CUB header breakes compilation with GCC 12 and SSE2 or better

fwyzard · May 31, 2023, 6:44am

Including <cub/cub.cuh> and <immintrin.h> breaks compilation with GCC 12, if the SSE2 instruction set is enabled.

Here is a simple reproducer, test.cu:

#include <cub/cub.cuh>
#include <immintrin.h>

__global__
void unused() {}

Compiling with

/usr/local/cuda-12.1/bin/nvcc -ccbin g++-12 --compiler-options "-msse2" test.cu -c -o test.o

fails with

/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fp16intrin.h(38): error: vector_size attribute requires an arithmetic or enum type
                  __v8hf __attribute__ ((__vector_size__ (16)));
                                         ^

/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fp16intrin.h(39): error: vector_size attribute requires an arithmetic or enum type
                  __v16hf __attribute__ ((__vector_size__ (32)));
                                          ^

/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fp16intrin.h(40): error: vector_size attribute requires an arithmetic or enum type
                  __v32hf __attribute__ ((__vector_size__ (64)));

(... 95 more errors ...)

Killed

I’ve traced this down to:

the header .../include/cub/cub.cuh indirectly includes the header .../include/cuda/std/detail/__config
.../include/cuda/std/detail/__config defines _Float16 as
```
#define _Float16 __half
```
this conflicts with GCC’s use of _Float16 on x86 systems with SSE2, and on ARM

_Float16 is used in GCC’s headers <avx512fp16vlintrin.h> and <avx512fp16intrin.h>, that are included by <immintrin.h>

A simple workaround seems to be to #undef _Float16 after including <cub/cub.cuh>, but I have no idea if this may lead to other conflicts in a more realistic program:

#include <cub/cub.cuh>
#undef _Float16
#include <immintrin.h>

__global__
void unused() {}

compiles fine.

fwyzard · May 31, 2023, 7:05am

I’ve also reported this as NVIDIA bug #4139266, and put a simple reproducer at GitHub - fwyzard/nvidia_bug_4139266: A simple reproducer for NVIDIA bug #4139266 .

fwyzard · August 30, 2023, 6:28am

The problem seems to be fixed in CUDA 12.2.2 .

system · September 13, 2023, 6:29am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.