Getting Immediate Speedups with NVIDIA A100 TF32

Originally published at: https://developer.nvidia.com/blog/getting-immediate-speedups-with-a100-tf32/

The NVIDIA A100 brought the biggest single-generation performance gains ever in our company’s history. These speedups are a product of architectural innovations that include Multi-Instance GPU (MIG), support for accelerated structural sparsity, and a new precision called TF32, which is the focus of this post. TF32 is a great precision to use for deep learning…

https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/tfloat32.h contains all the details of the tf32 data type, including storage, rounding, conversion, arithmetic operations, etc.