Hello,
Using NVIDIA Compute Utility (ncu), I obtained the name of a cuBLAS kernel: ampere_h16816gemm_128x64_ldg8_stages_64x3_nn
.
I am curious about the meaning of 64x3
in this context. I am having trouble understanding how the Z dimension is allocated and why it is set to 2.
Could someone please explain the following:
- What does
64x3
specifically refer to in this kernel name? - How is the Z dimension distributed or utilized in this context, and why is it set to 2?
Thank you in advance for your help!