Including CUB header breakes compilation with GCC 12 and SSE2 or better

Including <cub/cub.cuh> and <immintrin.h> breaks compilation with GCC 12, if the SSE2 instruction set is enabled.

Here is a simple reproducer,

#include <cub/cub.cuh>
#include <immintrin.h>

void unused() {}

Compiling with

/usr/local/cuda-12.1/bin/nvcc -ccbin g++-12 --compiler-options "-msse2" -c -o test.o

fails with

/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fp16intrin.h(38): error: vector_size attribute requires an arithmetic or enum type
                  __v8hf __attribute__ ((__vector_size__ (16)));

/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fp16intrin.h(39): error: vector_size attribute requires an arithmetic or enum type
                  __v16hf __attribute__ ((__vector_size__ (32)));

/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fp16intrin.h(40): error: vector_size attribute requires an arithmetic or enum type
                  __v32hf __attribute__ ((__vector_size__ (64)));

(... 95 more errors ...)


I’ve traced this down to:

  • the header .../include/cub/cub.cuh indirectly includes the header .../include/cuda/std/detail/__config
  • .../include/cuda/std/detail/__config defines _Float16 as
    #define _Float16 __half
  • this conflicts with GCC’s use of _Float16 on x86 systems with SSE2, and on ARM

_Float16 is used in GCC’s headers <avx512fp16vlintrin.h> and <avx512fp16intrin.h>, that are included by <immintrin.h>

A simple workaround seems to be to #undef _Float16 after including <cub/cub.cuh>, but I have no idea if this may lead to other conflicts in a more realistic program:

#include <cub/cub.cuh>
#undef _Float16
#include <immintrin.h>

void unused() {}

compiles fine.

I’ve also reported this as NVIDIA bug #4139266, and put a simple reproducer at GitHub - fwyzard/nvidia_bug_4139266: A simple reproducer for NVIDIA bug #4139266 .

The problem seems to be fixed in CUDA 12.2.2 .

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.