Undocumented PTX instruction `fma.rn.f16`

wenzel.jakob · April 5, 2024, 10:44am

The NVCC intrinsic __hfma maps to the PTX instruction fma.rn.f16. I was surprised when looking at the PTX reference that this instruction (including required PTX version / compute capability) aren’t listed here: 1. Introduction — PTX ISA 8.4 documentation.

And that just seems to be the start: For example, this page lists many intrinsics for FP16 arithmetic, and they are all absent from the PTX specification.

Could somebody at NVIDIA look into this and bring the PTX specification up to date?

Thank you,
Wenzel

striker159 · April 5, 2024, 11:28am

The instructions are included in your linked document.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions

wenzel.jakob · April 5, 2024, 11:31am

Thank you for the clarification. I was confused by how they are grouped out into a different section since the original one already lists different precisions.

system · April 19, 2024, 11:32am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
16 bit float operations CUDA Programming and Performance	2	7645	April 7, 2015
Are the intrinsics listed anywhere? CUDA Programming and Performance	3	364	February 7, 2023
floating point precision pragma's CUDA Programming and Performance	13	5477	April 13, 2012
Direct access to Volta HMMA instruction CUDA Programming and Performance	9	5235	December 19, 2017
Cublaslt fp8 SASS instruction QMMA CUDA Programming and Performance	2	1235	July 3, 2023
How FP32 and FP16 units are implemented in GP100 GPU's CUDA Programming and Performance	8	7545	March 28, 2017
Where is the ptxas documentation? CUDA Programming and Performance	2	5001	October 31, 2021
fma() CUDA Programming and Performance	2	8819	April 20, 2014
Mixed-Precision Programming with CUDA 8 Technical Blog	1	392	February 23, 2017
Half precision cuFFT Transforms GPU-Accelerated Libraries	12	6091	March 29, 2021

Undocumented PTX instruction `fma.rn.f16`

Related topics