problem about tensor core

i want to do int8 with tensorcore, but not supported now,
so i change my data to mat16[16]
from int8 to half,

half is in local or register, but not work,

when the second param int global, it work

half mat16[16];
wmma::load_matrix_sync(a[i], &mat16[0] /* not work in local */, 0);

the third param to 0, i want to copy data under control

i think it,s best to let me call 4x4x4 myself, it,s flexable, i can do anything i want