TensorRT int8 performance

fanofgpus · January 9, 2019, 1:02am

Hi,

We have experimented running inference on our ResNet-like tensorflow model (exported to uff) using TensorRT with bit depths of float, half, and int8. In benchmarking inference latency for each of these bit depths, we see about a 60% decrease in latency going from float to half which is great and makes sense as theoretical flops increase by about 2x here. Going from half to int8, we only observed about a 10-15% decrease in inference latency. Comparing theoretical operations/sec here, fp16 should do about 11 TFLOPS and int8 should do about 22 TOPS (https://developer.nvidia.com/embedded/faq#xavier-performance), so I suppose we expect a larger improvement in inference time for int8 here.

We will continue to benchmark and profile our inference app, but I wanted to see if nvidia or anyone can provide benchmarks comparing float16 vs int8 performance using tensorrt on xavier, and if nvidia could provide any insight or advice on the lack of speed up going from float16 to int8.

Thanks!

AastaLLL · January 9, 2019, 3:36am

Hi,

Could you help to enable the profiler for TensorRT?
[url]https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#profiling[/url]

This will give you a layer level profiling result and we can do further analysis and find the bottleneck from.
Thanks.

fanofgpus · January 10, 2019, 10:09pm

.

AastaLLL · January 11, 2019, 3:09am

Hi,

It looks beyond our expectation.
Could you share your model with us so we can report to our internal team?

Thanks.

Topic		Replies	Views
Int8 is not faster than fp16 on xavier Jetson AGX Xavier tensorrt	5	853	October 18, 2021
Jetson AGX Xavier INT8 Performance Jetson AGX Xavier	4	1867	October 18, 2021
TRT Engin in INT8 is much slower than FP16 TensorRT	4	2097	November 11, 2021
The inference speed of yolov5 tensorrt has little difference between int8 and fp16 TensorRT tensorrt , cuda	1	1628	September 8, 2022
Same inference speed for INT8 and FP16 TensorRT	10	6254	October 12, 2021
Jetson Thor - INT8 quantization show no performance gain over FP16 Jetson Thor tensorrt , jetson-inference , tensorrt-model-optimizer	7	248	January 26, 2026
Yolov3 int8 on tensorrt 7.1.0.16 Jetson Xavier NX tensorrt	4	931	October 18, 2021
QAT int8 TRT engine slower than fp16 TensorRT tensorrt , pytorch , python , onnx	3	2485	January 6, 2022
Same inference speed with Resnet50 for int8 and fp16 Jetson Xavier NX jetson-inference	4	778	October 18, 2021
Little performance difference between int8 and fp16 on RTX2080 TensorRT	4	2748	July 5, 2021

TensorRT int8 performance

Related topics