I checked both isa 8.4 and 8.5 and the example for: mma.sync.aligned.m16n8k16.row.col.f32.bf16.bf16.f32
is:
.reg .f16x2 %Ra<2>, %Rb<1>;
.reg .f32 %Rc<4>, %Rd<4>;
mma.sync.aligned.m16n8k16.row.col.f32.bf16.bf16.f32
{%Rd0, %Rd1, %Rd2, %Rd3},
{%Ra0, %Ra1, %Ra2, %Ra3},
{%Rb0, %Rb1},
{%Rc0, %Rc1, %Rc2, %Rc3};
how does: .reg f16x2 %Rb<1>;
result in %Rb0
and %Rb1
? Wouldn’t it just expand to a single %Rb0
?
I’m assuming a trillion dollar company wouldn’t publish the same typo in multi ISA documents describing the use of their flagship hardware offering. So clearly I’m missing something. Am I meant to interpret %Rb0 as %Rb0.x and %Rb1 as %Rb0.y? or maybe Rb<1> is a special case that doesn’t initialize ‘Rb’ from a 0 offset and ‘0’ and ‘1’ act like magic indexes into ‘%Rb’? Section 5.4.6 nor section 5.2.4 of the ptx isa offer any insights.