low inference latency for INT8, comaped to FP32, FP16 using Tensorflow 1.13 and TensorRT 5.1.2

Hi,

were are currrently testing the TensorRT integration in Tensorflow.

We use self compiled tensorflow 1.13.1 with TensorRT 5.1.2. Yesterday I also tested Tensorflow 1.14 with TensorRT 5.1.5.

When I want to convert the nets for a precision of INT8. I cannot use: the “use_calibration” option for the create_inference_graph function from the tensorflow.contrib.tensorrt module.

Error:
TypeError: create_inference_graph() got an unexpected keyword argument ‘use_calibration’

I also can not use the calib_graph_to_infer_graph function from the tensorflow.contrib.tensorrt module.

Error:

AttributeError: module ‘tensorflow.contrib.tensorrt’ has no attribute ‘calib_graph_to_infer_graph’

When I don’t use both net will convert into the INT8 precision net. But the speed is similiar to an original TF net. Hence, it is even slower than the net that is only optimized with TensorRT.

I have no speedup problems using the precisions of FP32 and FP16.

Tensorflow was build on a Ubuntu 18.4 maschine, for Cuda 10.0 and CuDNN 7. The speed has been tested on a Geforce GTX 1070 and a Quadro 5000.

The currently tested net is an adepted Resnet v1_50. It is strange, that the accuracy of the INT8 is the same of the FP32 net. I assume that the backup fp32 net musst be used during (almost) the whole inference.

Does anyone as a solution to the speed loss while using INT8 compared to FP16, FP32.

Kind regards!

Hello,
any updates?