Unlocking Tensor Core Performance with Floating Point Emulation in cuBLAS

Originally published at: Unlocking Tensor Core Performance with Floating Point Emulation in cuBLAS | NVIDIA Technical Blog

NVIDIA CUDA-X math libraries provide the fundamental numerical building blocks that enable developers to deploy accelerated applications across multiple high-performance domains, including AI and scientific computing. cuBLAS is a CUDA-X math library that consists of a highly optimized collection of basic linear algebra subroutines for matrix and vector operations that are specifically tuned to get…