Trade-off within gemm block size of cutlass

202476410arsmart · December 3, 2023, 7:23am

Hi! I am learning cute (cutlass) and in this example code, I think each block computes 128*128 for C. Because:

  // Define block sizes (static)
  auto bM = Int<128>{};
  auto bN = Int<128>{};
  auto bK = Int<  8>{};

And C size is 51205120, using ncu, I know there is 4040 blocks, so each block should compute 128*128.

My ncu shows:

So why we choose this size? I know there should be some coverage between compute and latency, how this trade-off is computed? Someone could kindly give me some link to in-depth analyze? Maybe some papers~

Thank you!!!

202476410arsmart · December 3, 2023, 7:23am

By the way, the register usage is 98! And the shared memory is just 9126byte! Cute is really highly efficient!

striker159 · December 3, 2023, 10:04am

All your questions to the inner workings of cutlass may be better suited for the cutlass developers directly. You can ask them on the cutlass github page.

Topic		Replies	Views
The larger block the better? CUDA Programming and Performance	8	474	March 25, 2024
How to determine a good ThreadblockShape in CUTLASS CUDA Programming and Performance	0	821	November 18, 2021
How does the threadblock size influence GEMM performance CUDA Programming and Performance	0	195	February 25, 2024
How many threads and blocks does cutlass use? (When C is tall in official post) GPU-Accelerated Libraries cutlass	1	704	June 14, 2022
How to decide the optimal block size in CUDA CUDA Programming and Performance	4	28074	February 15, 2010
CUBLAS grids and threads division GPU-Accelerated Libraries	7	3933	June 18, 2018
Question of using cublassgemm() for matrix mulitiplication CUDA Programming and Performance	3	1030	January 28, 2015
CUBLAS Configuration The use of CUBLAS for small matrix CUDA Programming and Performance	3	3795	April 4, 2007
Where does cutlass' detailed GEMM kernel? GPU-Accelerated Libraries cutlass	4	1102	June 16, 2022
CUTLASS: Division by Zero when using smaller threadtile sizes GPU-Accelerated Libraries	0	416	May 15, 2019

Trade-off within gemm block size of cutlass

Related topics