GEMM tile dimensions for tensor Cores

sr3007 · November 25, 2021, 4:32am

Hi,
I have observed that for a 512x64 GEMM operation, Nvidia doesn’t use a 512x64 GEMM tile and instead uses 4 rounds of 128x64 tile. In the list of tiles used, there is 256x128 tile present that maps to a single SM, but there is no 512x64, which would require the same memory. Why is this done?

Thank you.

Robert_Crovella · November 25, 2021, 5:24am

Is this a question about CUBLAS ?