How to determine a good ThreadblockShape in CUTLASS

user55015 · November 18, 2021, 3:26pm

Why the default configuration of GEMM in CUTLASS use a ThreadblockShape of [128, 128, 8]? I know that BlockM (128) and BlockN (128) might be determined in terms of arithmetic intensity, but why BlockK is set to 8 ?

// include/cutlass/gemm/device/default_gemm_configuration.h
template <
  typename ArchTag,
  typename ElementA, 
  typename ElementB, 
  typename ElementC, 
  typename ElementAccumulator>
struct DefaultGemmConfiguration<
  arch::OpClassSimt, 
  ArchTag,
  ElementA, 
  ElementB, 
  ElementC, 
  ElementAccumulator> {
  
  static int const kAlignmentA = 1;
  static int const kAlignmentB = 1;
  using ThreadblockShape = GemmShape<128, 128, 8>;
  using WarpShape = GemmShape<32, 64, 8>;
  using InstructionShape = GemmShape<1, 1, 1>;
  static int const kStages = 2;

  using EpilogueOutputOp = epilogue::thread::LinearCombination<
    ElementC,
    1,
    ElementAccumulator,
    ElementAccumulator
  >;

  using Operator = arch::OpMultiplyAdd;
};

Topic		Replies	Views
How does the threadblock size influence GEMM performance CUDA Programming and Performance	0	191	February 25, 2024
Understanding cutlass GEMM hierarchy GPU-Accelerated Libraries cutlass	1	3687	October 14, 2021
Trade-off within gemm block size of cutlass CUDA Programming and Performance	2	333	December 3, 2023
How many threads and blocks does cutlass use? (When C is tall in official post) GPU-Accelerated Libraries cutlass	1	698	June 14, 2022
Cutlasss Functionality for SIMT GPU-Accelerated Libraries cutlass	1	395	October 30, 2023
CUTLASS: Division by Zero when using smaller threadtile sizes GPU-Accelerated Libraries	0	413	May 15, 2019
Thread Block Shape Versus Performance Choosing proper Thread Block Shape CUDA Programming and Performance	6	7065	May 23, 2007
Question of using cublassgemm() for matrix mulitiplication CUDA Programming and Performance	3	1017	January 28, 2015
cuBLAS launch 5 times threads blocks more than expected GPU-Accelerated Libraries cublas	4	497	April 11, 2024
Where does cutlass' detailed GEMM kernel? GPU-Accelerated Libraries cutlass	4	1093	June 16, 2022

How to determine a good ThreadblockShape in CUTLASS

Related topics