load int8 shared memory data into fp16 wmma::fragment

Hi,

I have a big matrix of int8 data into the shared memory, which I would like to multiply with a fp16 matrix (also in shared memory) using tensor cores.

Do you know if there is any way to load the int8 data directly to a fp16 wmma::fragment without needing to translate the int8 data to fp16 data in the shared memory, and then loading the fragment?

Many thanks,