Just Released: CUTLASS 3.8

Originally published at: GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines

Provides support for the NVIDIA Blackwell SM100 architecture. CUTLASS is a collection of CUDA C++ templates and abstractions for implementing high-performance GEMM computations.