Name explain about stages_64x3

Hello,

Using NVIDIA Compute Utility (ncu), I obtained the name of a cuBLAS kernel: ampere_h16816gemm_128x64_ldg8_stages_64x3_nn.

I am curious about the meaning of 64x3 in this context. I am having trouble understanding how the Z dimension is allocated and why it is set to 2.

Could someone please explain the following:

  1. What does 64x3 specifically refer to in this kernel name?
  2. How is the Z dimension distributed or utilized in this context, and why is it set to 2?

Thank you in advance for your help!

That looks like a cutlass kernel. Since it’s Ampere start with cutlass v2 docs.

You can also look at the v3 docs for more details.

I’m pretty sure 64x3 is for cta_k x stages

cta_k == Threadblock shape in the K dimension.

stages == Number of stages of threadblock-scoped matrix multiply.

1 Like

That helps! Thanks!
Just any other guess about Z dimension?