CUTLASS: Fast Linear Algebra in CUDA C++

Originally published at: CUTLASS: Fast Linear Algebra in CUDA C++ | NVIDIA Technical Blog

Matrix multiplication is a key computation within many scientific applications, particularly those in deep learning. Many operations in modern deep neural networks are either defined as matrix multiplications or can be cast as such. As an example, the NVIDIA cuDNN library implements convolutions for neural networks using various flavors of matrix multiplication. Matrix multiplication is…