What does it mean that the grid size in the z dimension is more than one in cuBlas gemms?

mustafaali · July 11, 2023, 8:19am

Hi,

When I profile gemm kernels on v100 GPUs using cuBlas kernels, I can see the grid size is larger than one in the z dimension. Here is an example,

gemm shape = 512, 8192, 8192

I am trying to understand what does it mean to have grid.size.z = 3, does that mean tiling is happening in K dimension across different thread blocks?
If so, how is the reduction happening for such thread blocks? I don’t see a reduction kernel which is usually the case when you want to optimize a gemm kernel that has k >> m,n

How can I investigate this further using ncu?

I am running cuBlas gemm on v100 using f16 datatype (tensor core op)

Thanks,
Mustafa.

jmarusarz · July 13, 2023, 8:25pm

The internal decomposition done by cublas is specific to the way the the library is implemented. How the reduction is done etc… is an implementation detail of the library. In general, cuda grids can be 1,2,or 3 dimensional. The choice is based on what offers the best calculation for your indices. They may be able to provide details of the implementation or reduction on the cublas forum GPU-Accelerated Libraries - NVIDIA Developer Forums but I can’t be sure.

system · July 27, 2023, 8:26pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What does it mean that the grid size in the z dimension is more than one in cuBlas gemms? GPU-Accelerated Libraries cublas	2	589	August 24, 2023
CUBLAS grids and threads division GPU-Accelerated Libraries	7	3939	June 18, 2018
using 'z' within the grid size CUDA Programming and Performance	2	1243	December 25, 2009
Name explain about stages_64x3 GPU-Accelerated Libraries	2	195	July 20, 2024
cuBLAS sgemm is slow CUDA Programming and Performance	4	2585	June 26, 2017
Max Dimension of GridSize and BlockSize CUDA Programming and Performance	8	10335	June 19, 2011
Please help for using cublas Zgemm~ CUDA Programming and Performance	4	1835	July 27, 2015
blockIdx.z returns wrong values CUDA Programming and Performance	4	9956	June 15, 2007
3D grid dimensions for compute compatability 6.1 CUDA Programming and Performance	2	513	October 1, 2019
What is the settings (blocks per grid) & (threads per block) will be created when APIs of cublas or cudnn is called? CUDA Programming and Performance	4	699	July 5, 2023

What does it mean that the grid size in the z dimension is more than one in cuBlas gemms?

Related topics