About the time taken for FP16 inference and int8 inference in TensorRT

yu.saito4 · October 5, 2023, 1:17am

When performing inference with TensorRT, is it possible for INT8 inference to be slower than FP16 inference?

spolisetty · November 15, 2023, 1:23pm

Yes, it is possible for INT8 inference to be slower than FP16 inference with TensorRT for a few reasons.
Int8 models may not be as well optimized as FP16 models for some hardware platforms.
INT8 quantization can introduce additional overhead.

Topic		Replies	Views
TRT Engin in INT8 is much slower than FP16 TensorRT	4	2100	November 11, 2021
QAT int8 TRT engine slower than fp16 TensorRT tensorrt , pytorch , python , onnx	3	2490	January 6, 2022
Same inference speed for INT8 and FP16 TensorRT	10	6271	October 12, 2021
The inference speed of yolov5 tensorrt has little difference between int8 and fp16 TensorRT tensorrt , cuda	1	1630	September 8, 2022
Why is' int8 'not as fast as' fp16' TensorRT tensorrt	1	637	February 1, 2021
TensorRT int8 slower than FP16 due to reformat layer TensorRT tensorrt , cudnn	0	210	October 11, 2024
Little performance difference between int8 and fp16 on RTX2080 TensorRT	4	2759	July 5, 2021
YoloV4 slower in INT8 than FP16 TensorRT	5	1695	June 5, 2021
TensorRT --fp16 pre and post Int8 quantization TensorRT cudnn	1	155	September 2, 2024
TensorRT int8 performance Jetson AGX Xavier	4	1381	October 18, 2021

About the time taken for FP16 inference and int8 inference in TensorRT

Related topics