NVIDIA Developer Forums
Why we have three GEMM in cutlass?
Accelerated Computing
CUDA
CUDA Programming and Performance
202476410arsmart
August 28, 2024, 10:55am
1
image
1645×2853 325 KB
I understand part 2 and 3. That is just k_iter 0 and [1, k_end). But what is part 1? Why we loop over k_block?
Robert_Crovella
August 28, 2024, 2:10pm
2
please don’t post pictures of code on these forums.
Related topics
Topic
Replies
Views
Activity
Understanding cutlass GEMM hierarchy
GPU-Accelerated Libraries
cutlass
1
3430
October 14, 2021
At CUTLASS, what does the tile dimension K mean?
GPU-Accelerated Libraries
1
414
September 14, 2020
Cublas and Cutlas 8bit GEMM matrix size constraints
GPU-Accelerated Libraries
0
683
June 30, 2020
What does it mean that the grid size in the z dimension is more than one in cuBlas gemms?
GPU-Accelerated Libraries
cublas
2
541
August 24, 2023
In hopper, what is rs and ss strategy?
CUDA Programming and Performance
4
53
July 22, 2024
What does it mean that the grid size in the z dimension is more than one in cuBlas gemms?
Nsight Compute
2
365
July 13, 2023
Trade-off within gemm block size of cutlass
CUDA Programming and Performance
2
271
December 3, 2023
Question about Block and Thread Organization dimBlock.x, dimBlock.y, dimGrid, dimBlock
CUDA Programming and Performance
9
14618
April 22, 2012
CUDA motivation for multi-dimensional kernel execution
CUDA Programming and Performance
6
4147
December 8, 2013
How to determine a good ThreadblockShape in CUTLASS
CUDA Programming and Performance
0
759
November 18, 2021