Getting Immediate Speedups with NVIDIA A100 TF32

jwitsoe · November 13, 2020, 9:03pm

Originally published at: https://developer.nvidia.com/blog/getting-immediate-speedups-with-a100-tf32/

The NVIDIA A100 brought the biggest single-generation performance gains ever in our company’s history. These speedups are a product of architectural innovations that include Multi-Instance GPU (MIG), support for accelerated structural sparsity, and a new precision called TF32, which is the focus of this post. TF32 is a great precision to use for deep learning…

haichengw · November 15, 2020, 3:43am

NVIDIA official open source library github/nvidia/cutlass contains all the details of the tf32 data type, including storage, rounding, conversion, arithmetic operations, etc.

Topic		Replies	Views
Accelerating AI Training with NVIDIA TF32 Tensor Cores Technical Blog	1	552	January 29, 2021
Accelerating TensorFlow on NVIDIA A100 GPUs Technical Blog	0	518	August 25, 2020
NVIDIA Ampere Architecture In-Depth Technical Blog	0	948	August 25, 2020
Inside Volta: The World’s Most Advanced Data Center GPU Technical Blog	43	1040	October 1, 2018
Developer Blog: Improving Computer Vision with NVIDIA A100 GPUs Technical Blog	0	322	November 1, 2021
Setting New Records at Data Center Scale Using NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand Technical Blog	0	314	November 8, 2023
NVIDIA Hopper Architecture In-Depth Technical Blog	2	1026	August 8, 2022
One Giant Superchip for LLMs, Recommenders, and GNNs: Introducing NVIDIA GH200 NVL32 Technical Blog	0	536	November 28, 2023
Perfomance question for Tesla V100 CUDA Programming and Performance	11	2502	May 24, 2017
Nvidia announces Tesla V100 (Volta) CUDA Programming and Performance	19	5233	November 30, 2017

Getting Immediate Speedups with NVIDIA A100 TF32

Related topics