The NVCC intrinsic __hfma
maps to the PTX instruction fma.rn.f16
. I was surprised when looking at the PTX reference that this instruction (including required PTX version / compute capability) aren’t listed here: 1. Introduction — PTX ISA 8.4 documentation.
And that just seems to be the start: For example, this page lists many intrinsics for FP16 arithmetic, and they are all absent from the PTX specification.
Could somebody at NVIDIA look into this and bring the PTX specification up to date?
Thank you,
Wenzel