low inference latency for INT8, comaped to FP32, FP16 using Tensorflow 1.13 and TensorRT 5.1.2

sebastian.heiden · July 24, 2019, 10:41am

Hi,

were are currrently testing the TensorRT integration in Tensorflow.

We use self compiled tensorflow 1.13.1 with TensorRT 5.1.2. Yesterday I also tested Tensorflow 1.14 with TensorRT 5.1.5.

When I want to convert the nets for a precision of INT8. I cannot use: the “use_calibration” option for the create_inference_graph function from the tensorflow.contrib.tensorrt module.

Error:
TypeError: create_inference_graph() got an unexpected keyword argument ‘use_calibration’

I also can not use the calib_graph_to_infer_graph function from the tensorflow.contrib.tensorrt module.

Error:

AttributeError: module ‘tensorflow.contrib.tensorrt’ has no attribute ‘calib_graph_to_infer_graph’

When I don’t use both net will convert into the INT8 precision net. But the speed is similiar to an original TF net. Hence, it is even slower than the net that is only optimized with TensorRT.

I have no speedup problems using the precisions of FP32 and FP16.

Tensorflow was build on a Ubuntu 18.4 maschine, for Cuda 10.0 and CuDNN 7. The speed has been tested on a Geforce GTX 1070 and a Quadro 5000.

The currently tested net is an adepted Resnet v1_50. It is strange, that the accuracy of the INT8 is the same of the FP32 net. I assume that the backup fp32 net musst be used during (almost) the whole inference.

Does anyone as a solution to the speed loss while using INT8 compared to FP16, FP32.

Kind regards!

farescharfii · January 24, 2020, 10:42am

Hello,
any updates?

Topic		Replies	Views
inference speed not improve between FP32 vs FP16 when using tensorflow.contrib.tensorrt Jetson AGX Xavier	4	721	October 18, 2021
Failed to use INT8 precision mode when using tf-trt on Xavier Jetson AGX Xavier	4	968	October 18, 2021
High inference time while running UNet with INT8 precision TensorRT tensorrt	5	978	February 10, 2021
Inference using FP16 and FP32 precision giving no performance gain on Jetson Nano Jetson Nano	2	1341	October 14, 2021
TensorRT inference time much faster than cuDNN TensorRT	5	1607	February 22, 2022
How does INT4 work in the Tesla T4? GPU-Accelerated Libraries tensorrt	0	599	April 13, 2020
Performance of TensorRT conversion of ResNet50 on Quadro P6000 TensorRT	1	642	December 23, 2019
No speed up tensorrt model in inference (xavier) Jetson AGX Xavier tensorrt	4	624	October 18, 2021
The inference speed of yolov5 tensorrt has little difference between int8 and fp16 TensorRT tensorrt , cuda	1	1482	September 8, 2022
No speed up with TensorRT FP16 or INT8 on NVIDIA V100 TensorRT	7	2807	November 15, 2019

low inference latency for INT8, comaped to FP32, FP16 using Tensorflow 1.13 and TensorRT 5.1.2

Related topics