Hi All,
What does the 1.kind and 2.kind mean for the following tcgen05 ptx instructions?
“tcgen05.mma.cta_group::1.kind::f16 [%0], [%1], %2, %3, {%5, %6, %7, %8}, p; \n\t”
“tcgen05.mma.cta_group::2.kind::f16 [%0], [%1], %2, %3, {%5, %6, %7, %8, %9, %10, %11, %12}, p; \n\t”
Thanks,
Hi smartvoice,
it is not 1.kind
and 2.kind
, but cta_group::1
and cta_group::2
.
The mma
is executed on one SM or a pair of SMs. So up to 8 tensor core units (4 per SM) can work together.
2 Likes
The identifiers are cta_group::1 and cta_group::2 , not 1.kind and 2.kind.
Their explanation are given in the ptx docs
1 Like
Thanks for pointing it out.
For the scaling factors, it seems that the tcgen05 instructions reads from shared memory. Is it correct?
Are all the scaling factors are precomputed, then use tma to copy from global memory to shared memory before feeding to the tensor core?
if (cute::elect_one_sync()) {
asm volatile(
"{\n\t"
".reg .pred p;\n\t"
"setp.ne.b32 p, %4, 0;\n\t"
"tcgen05.mma.cta_group::2.kind::mxf8f6f4.block_scale [%0], %1, %2, %3, [%5], [%6], p; \n\t"
"}\n"
:
: "r"(tmem_c), "l"(desc_a), "l"(desc_b), "r"(uint32_t(idescE>>32)), "r"(scaleC),
"r"(tsfa_addr), "r"(tsfb_addr));
}