Error: identifier "__hdiv" is undefined when include cuda_fp16.h in .cu

basi info:
Driver Version: 525.105.17
CUDA Version: 11.8
Compute Capability 8.6

when use function in cuda_fp16.h at .cu, it will report identifier xxx is undefined like this:

(14:14:24) ERROR: /apollo/modules/perception/common/inference/tensorrt/BUILD:7:12: Compiling modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF … (remaining 223 arguments skipped)
nvcc warning : The ‘compute_35’, ‘compute_37’, ‘sm_35’, and ‘sm_37’ architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The ‘compute_35’, ‘compute_37’, ‘sm_35’, and ‘sm_37’ architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(40): error: identifier “__hdiv” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(113): error: identifier “__hfma” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(113): error: identifier “__hfma” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(122): error: identifier “__hfma2” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(122): error: identifier “__hfma2” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(221): error: identifier “__hmul” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(222): error: identifier “__hsub” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(225): error: identifier “hcos” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(225): error: identifier “hsin” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(225): error: more than one conversion function from “const __half” to a built-in type applies:
function “__half::operator float() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(204): here
function “__half::operator short() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(222): here
function “__half::operator unsigned short() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(225): here
function “__half::operator int() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(228): here
function “__half::operator unsigned int() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(231): here
function “__half::operator long long() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(234): here
function “__half::operator unsigned long long() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(237): here
function “__half::operator __nv_bool() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(241): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(232): error: identifier “__hfma” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(241): error: identifier “__hdiv” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(320): error: identifier “__hmul” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(321): error: identifier “__hsub” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(324): error: identifier “hcos” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(324): error: identifier “hsin” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(324): error: more than one conversion function from “const __half” to a built-in type applies:
function “__half::operator float() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(204): here
function “__half::operator short() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(222): here
function “__half::operator unsigned short() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(225): here
function “__half::operator int() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(228): here
function “__half::operator unsigned int() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(231): here
function “__half::operator long long() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(234): here
function “__half::operator unsigned long long() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(237): here
function “__half::operator __nv_bool() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(241): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(333): error: identifier “__hfma2” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(332): error: identifier “__hadd2” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(339): error: identifier “__hmul2” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(339): error: identifier “__h2div” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(572): error: identifier “__hmul” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(573): error: identifier “__hsub” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(576): error: identifier “hcos” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(576): error: identifier “hsin” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(576): error: more than one conversion function from “const __half” to a built-in type applies:
function “__half::operator float() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(204): here
function “__half::operator short() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(222): here
function “__half::operator unsigned short() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(225): here
function “__half::operator int() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(228): here
function “__half::operator unsigned int() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(231): here
function “__half::operator long long() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(234): here
function “__half::operator unsigned long long() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(237): here
function “__half::operator __nv_bool() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(241): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(585): error: identifier “__hfma2” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(584): error: identifier “__hadd2” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(591): error: identifier “__hmul2” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(591): error: identifier “__h2div” is undefined

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(430): error: identifier “hcos” is undefined
detected during instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(430): error: identifier “hsin” is undefined
detected during instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(430): error: more than one conversion function from “const __half” to a built-in type applies:
function “__half::operator float() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(204): here
function “__half::operator short() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(222): here
function “__half::operator unsigned short() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(225): here
function “__half::operator int() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(228): here
function “__half::operator unsigned int() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(231): here
function “__half::operator long long() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(234): here
function “__half::operator unsigned long long() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(237): here
function “__half::operator __nv_bool() const”
/usr/local/cuda-11.8/bin/…/targets/x86_64-linux/include/cuda_fp16.hpp(241): here
detected during instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(439): error: identifier “__hfma2” is undefined
detected during instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(438): error: identifier “__hadd2” is undefined
detected during instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(445): error: identifier “__hmul2” is undefined
detected during instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(445): error: identifier “__h2div” is undefined
detected during instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(475): error: identifier “__hsub” is undefined
detected during instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(475): error: identifier “__hmul” is undefined
detected during instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(425): error: identifier “__hmul” is undefined
detected during:
instantiation of “void rotateKernel_int8(int, int8_4 *, float, const int8_4 *, float, const T *, const T *, int, int, int, RotateInterpolation) [with T=float]”
(747): here
instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu(426): error: identifier “__hsub” is undefined
detected during:
instantiation of “void rotateKernel_int8(int, int8_4 *, float, const int8_4 *, float, const T *, const T *, int, int, int, RotateInterpolation) [with T=float]”
(747): here
instantiation of “void rotate_int8(int8_4 *, float, const int8_4 *, float, const T *, const T *, int *, RotateInterpolation, cudaStream_t) [with T=float]”
(758): here

41 errors detected in the compilation of “modules/perception/common/inference/tensorrt/plugins/rotateKernel.cu”.

not all GPUs support half/fp16. You are evidently compiling for GPU architectures that do not support it:

Half/fp16 support begins with GPU architectures of compute capability 5.0 or newer. That does not mean that every half or fp16 operation is supported in any CUDA version for any GPU of cc5.0 or higher. To get the best support/compatibility:

  • use the latest available CUDA version
  • compile for the architecture that actually matches the GPU you intend to run on (i.e. probably cc8.6 based on what you have posted here)

In any event, you will not get half support on a cc3.5 GPU (for example) and if you attempt to compile any of these half functions for that GPU target, you will get those errors. That is expected behavior.

If I am cross-compiling on a platform without a GPU, what should I do?

  1. Use the lastest available CUDA version.

  2. Compile for (only) the architecture(s) you intend to run on.

Problems trying to build on windows platforms · Issue #1076 · NVIDIA/TransformerEngine (github.com)

Hi, Could you have a look at this problem ? I have the similiar question about header files / functions missing.