Turing Arch - INT4 ops with tensor cores

joaoluffy · October 25, 2018, 8:38pm

Hi guys, is there currently any way to perform INT4 ops with turing tensor cores? CuBLAS only allows float16 and float32, according to https://docs.nvidia.com/cuda/cublas/index.html#cublassetmathmode

Is a new API coming out soon or something like that? Cheers.

cudapop1 · November 1, 2018, 4:31pm

You need to use the sub-byte WMMA experimental features to perform INT4 tensor core operations, see: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma-subbyte

Topic		Replies	Views
cuBLAS INT8 tensor core mode vs. FP16 mode GPU-Accelerated Libraries	0	894	February 15, 2019
cuBLAS GEMM INT8 is much slower than FP16 in T4 GPU-Accelerated Libraries cublas	11	4554	November 2, 2023
cuBLAS INT8 tensor core mode vs. FP16 mode GPU-Accelerated Libraries cublas	13	5624	December 5, 2022
Turing Tensor core int4 operation TensorRT	3	2829	December 11, 2018
TensorRT in INT4 precision mode TensorRT	1	1126	February 25, 2019
Turing architecture compatibility with CUDA 8 GPU-Accelerated Libraries	2	493	October 12, 2021
Cudnn can't use tensorcore cuDNN	0	770	March 16, 2023
INT4 on Jetson-AGX-Orin or Jetson-Orin-Nano? Jetson AGX Orin gpu-computing	3	479	September 10, 2024
Supported datatypes in cuTENSOR CUDA Developer Tools	0	293	October 28, 2020
problem about tensor core CUDA Programming and Performance	2	702	June 28, 2018