Originally published at: GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines
Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance GEMM computations.