Turing Tensor core int4 operation

guo.tang · December 11, 2018, 6:52am

Can TensorRT support 4bit integer quantification? I cannot find any example in TensorRT example source code.
If not, how to implement it by ourselves?

Thanks!

NVES · December 11, 2018, 5:03pm

Hello,
Please reference TensorRT support matrix, which lists the TensorRT layers, hardware, and the precision modes that each layer supports.

reference https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html#layers-precision-matrix

guo.tang · December 11, 2018, 8:24pm

According to https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html#layers-precision-matrix
Table 3.

INT8 is pretty much not supported in TensorRT 5.0.4 except some data rearrange layer. But if I compile sampleINT8API example in GeForce 2070 hardware, the inference time is about 3x faster comparing with float32, and about 40% faster than float16.

How can it be faster if it is not supported?

NVES · December 11, 2018, 9:24pm

Hello,

2070 has a CUDA 7.5 compute compatibility version, which per Support Matrix :: NVIDIA Deep Learning TensorRT Documentation supports INT8 precision mode.

For more details on 8bit inference on TRT, please see:

http://on-demand.gputechconf.com/gtc/2017/video/s7310-szymon-migacz-8-bit-inference-with-tensorrt.mp4

Topic		Replies	Views
tensorRT FP8 support TensorRT tensorrt	2	2298	June 21, 2023
Unable to quantization FP8 in TensorRT TensorRT tensorrt	1	421	June 20, 2023
TensorRT in INT4 precision mode TensorRT	1	1093	February 25, 2019
High inference time while running UNet with INT8 precision TensorRT tensorrt	5	960	February 10, 2021
Is there any layer that fp16 supports but int8 does not？ TensorRT	5	478	December 1, 2021
TensorRT mix precision with supported hard drives TensorRT tensorrt	3	622	April 19, 2021
Question about the tensorrt precision transformation TensorRT	4	469	July 12, 2021
How does INT4 work in the Tesla T4? GPU-Accelerated Libraries tensorrt	0	568	April 13, 2020
Acceleration with INT8 precision using TensorRT TensorRT tensorrt , cuda , deep-learning	6	746	February 13, 2021
TRT Engin in INT8 is much slower than FP16 TensorRT	4	1874	November 11, 2021

Turing Tensor core int4 operation

Related topics