Getting Immediate Speedups with NVIDIA A100 TF32

Originally published at:

The NVIDIA A100 brought the biggest single-generation performance gains ever in our company’s history. These speedups are a product of architectural innovations that include Multi-Instance GPU (MIG), support for accelerated structural sparsity, and a new precision called TF32, which is the focus of this post. TF32 is a great precision to use for deep learning…

NVIDIA official open source library github/nvidia/cutlass contains all the details of the tf32 data type, including storage, rounding, conversion, arithmetic operations, etc.