Why we have three GEMM in cutlass?

202476410arsmart · August 28, 2024, 10:55am

I understand part 2 and 3. That is just k_iter 0 and [1, k_end). But what is part 1? Why we loop over k_block?

Robert_Crovella · August 28, 2024, 2:10pm

please don’t post pictures of code on these forums.

Topic		Replies	Views
How to use slicedK in GEMM? CUDA Programming and Performance	2	1223	June 27, 2022
Where is cute's gemm code? CUDA Programming and Performance	20	2703	October 13, 2024
Trade-off within gemm block size of cutlass CUDA Programming and Performance	2	339	December 3, 2023
require help in looping in gauss elimination implementaion CUDA Programming and Performance	0	1094	May 28, 2009
Where does cutlass' detailed GEMM kernel? GPU-Accelerated Libraries cutlass	4	1100	June 16, 2022
Are there any blogs about rasterization and swizzle in cutlass? CUDA NVCC Compiler cuda	1	68	August 11, 2025
Own GEMV implementation failed after some iterations [code included] CUDA Programming and Performance	4	1787	June 24, 2009
CUTLASS: Fast Linear Algebra in CUDA C++ Technical Blog	13	2140	September 9, 2024
Just Released: CUTLASS 3.8 Technical Blog	1	384	February 4, 2025
one kernel two codes - cuda arch 1.1 CUDA Programming and Performance	32	3200	June 19, 2011