Undocumented PTX instruction `fma.rn.f16`

The NVCC intrinsic __hfma maps to the PTX instruction fma.rn.f16. I was surprised when looking at the PTX reference that this instruction (including required PTX version / compute capability) aren’t listed here: 1. Introduction — PTX ISA 8.4 documentation.

And that just seems to be the start: For example, this page lists many intrinsics for FP16 arithmetic, and they are all absent from the PTX specification.

Could somebody at NVIDIA look into this and bring the PTX specification up to date?

Thank you,
Wenzel

The instructions are included in your linked document.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions

Thank you for the clarification. I was confused by how they are grouped out into a different section since the original one already lists different precisions.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.