Originally published at: Implementing High Performance Matrix Multiplication Using CUTLASS v2.8 | NVIDIA Technical Blog
High performance CUTLASS template abstractions support matrix multiply operations (GEMM), Convolution AI, and improved Strided-DGrad.