I’m trying to check FP16 performance of CUFFT. The CUDA Toolkit Documentation for CUDA 7.5 and for CUDA 8.0 claims under http://docs.nvidia.com/cuda/cufft/#introduction
This version of the cuFFT library supports the following features: ... - Half-precision (16-bit floating point), single-precision (32-bit floating point) and double-precision (64-bit floating point). ...
Similarly section 2.3.1. http://docs.nvidia.com/cuda/cufft/#half-precision-transforms
indicates that half precision transforms are supported.
However, the neither documentation, any of the header files cufft*.h, the types in cufftType_t, nor anything in cuda_fp16.h gave me any hints as to how to actually run such transforms :-(
What am I missing?
Or is this a documentation bug?