load int8 shared memory data into fp16 wmma::fragment

Accelerated Computing CUDA CUDA Programming and Performance

ocanela August 7, 2019, 2:20pm 1

Hi,

I have a big matrix of int8 data into the shared memory, which I would like to multiply with a fp16 matrix (also in shared memory) using tensor cores.

Do you know if there is any way to load the int8 data directly to a fp16 wmma::fragment without needing to translate the int8 data to fp16 data in the shared memory, and then loading the fragment?

Many thanks,

Topic		Replies	Views
What is the best way to re-use a tensor core C fragment now as A or B input when their types differ? CUDA Programming and Performance	5	708	November 24, 2023
recast nvcuda::wmma::fragment from unsigned char to half CUDA Programming and Performance	0	649	December 23, 2019
problem about tensor core CUDA Programming and Performance	2	696	June 28, 2018
Question about efficient usage of wmma CUDA Programming and Performance	2	315	February 29, 2024
sample code for Integer arithmetics in RTX tensor cores? CUDA Programming and Performance	2	560	January 23, 2019
When working on elements of fragments directly, is it computed inside tensor core or CUDA core? CUDA Programming and Performance	2	63	September 14, 2024
How to stridedly read data for Tensor core? CUDA Programming and Performance	2	237	October 16, 2023
Can we directly use register value for tensor core calculation? CUDA Programming and Performance	4	574	October 18, 2023
Matrix Multiplication with Shared Memory CUDA Programming and Performance	0	1346	September 28, 2009
[Question] How does the threads in a warp work collectively? CUDA Programming and Performance	5	166	July 8, 2024

load int8 shared memory data into fp16 wmma::fragment

Related topics