sample code for Integer arithmetics in RTX tensor cores?

cbuchner1 · January 21, 2019, 2:55pm

Hi,

I was wondering if there is any self contained, compileable sample code out there that shows multiplying signed/unsigned 8, 4 or 1 bit integers on the new Turing tensor cores with 32 bit accumulation.

All I’ve seen so far is short snippets on a few AI conference slides, such as

__device__ void tensor_op_16_16_16(char *a, char *b, int *c)
{
wmma::fragment<wmma::matrix_a, 16, 16, 16, char, ...> a_frag;
wmma::fragment<wmma::matrix_b, 16, 16, 16, char, ...> b_frag;
wmma::fragment<wmma::accumulator, 16, 16, 16, int, ...> c_frag;
}
wmma::load_matrix_sync(a_frag, a, ...);
wmma::load_matrix_sync(b_frag, b, ...);
wmma::fill_fragment(c_frag, 0.0f);
wmma::mma_sync(c_frag, a_frag, b_frag, c_frag);
wmma::store_matrix_sync(c, c_frag, ...);

Christian

Robert_Crovella · January 22, 2019, 8:04pm

not exactly what you are asking for, but there is additional detail in the programming guide:

[url]https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma[/url]

cbuchner1 · January 23, 2019, 1:03pm

I guess I will have a closer look at the internals of the Cutlass 1.2 library.

Topic		Replies	Views
When working on elements of fragments directly, is it computed inside tensor core or CUDA core? CUDA Programming and Performance	2	86	September 14, 2024
Programming Tensor core in RTX4070 CUDA Programming and Performance	1	479	January 18, 2024
How to use WMMA efficiently CUDA Programming and Performance	4	8154	October 23, 2020
load int8 shared memory data into fp16 wmma::fragment CUDA Programming and Performance	0	506	August 7, 2019
Matrix Multiplication Using Register Caches and __shfl? CUDA Programming and Performance	3	872	December 17, 2020
How does 4x4 mma at tensor core level translate to 16x16 mma at warp level? CUDA Programming and Performance cuda	2	1018	November 15, 2023
Turing 16x16 MMA, SM usage, 1 or 2? CUDA Programming and Performance	2	1025	December 8, 2018
4x4 wmma on tensor core CUDA Programming and Performance	3	1009	December 6, 2021
Do Tensor Core fragments help conserve registers? CUDA Programming and Performance	2	640	October 30, 2021
WMMA - What does "warp matrix operations" mean? CUDA Programming and Performance cuda , kernel	7	6645	October 18, 2022

sample code for Integer arithmetics in RTX tensor cores?

Related topics