CUTLASS: Fast Linear Algebra in CUDA C++

jwitsoe · August 21, 2022, 11:44pm

Originally published at: CUTLASS: Fast Linear Algebra in CUDA C++ | NVIDIA Technical Blog

Matrix multiplication is a key computation within many scientific applications, particularly those in deep learning. Many operations in modern deep neural networks are either defined as matrix multiplications or can be cast as such. As an example, the NVIDIA cuDNN library implements convolutions for neural networks using various flavors of matrix multiplication. Matrix multiplication is…

Topic		Replies	Views
Implementing High Performance Matrix Multiplication Using CUTLASS v2.8 Technical Blog	0	521	November 23, 2021
CUTLASS: Fast Linear Algebra in CUDA C++ Technical Blog	13	1966	September 9, 2024
cuBLAS for Deep Learning? GPU-Accelerated Libraries	0	421	August 31, 2018
Pro Tip: cuBLAS Strided Batched Matrix Multiply Technical Blog	0	393	November 1, 2021
Accelerating GPU Applications with NVIDIA Math Libraries Technical Blog	0	389	July 26, 2022
Just Released: CUTLASS 3.8 Technical Blog	1	266	February 4, 2025
Boosting Matrix Multiplication Speed and Flexibility with NVIDIA cuBLAS 12.9 Technical Blog	1	17	May 1, 2025
New cuBLAS 12.0 Features and Matrix Multiplication Performance on NVIDIA Hopper GPUs Technical Blog	0	529	February 1, 2023
CUDA stand-alone version of dense matrix-vector multiplication CUDA Programming and Performance	4	1049	May 4, 2022
Just Released: CUTLASS 3.8 Technical Blog	1	87	January 31, 2025

CUTLASS: Fast Linear Algebra in CUDA C++

Related topics