The average runtime was 22.5 ms, which is pretty good on the Xavier
The ultimate goal is to implement TF-TRT on INT8 bit inference with a reasonable speed. So I re-implemented the same repo with INT8 inference then the average runtime becomes drastically slower 2175ms almost 2 seconds
Could you please suggest a solution to increase the speed or performance. I am using the ssd_mobilenetv1_coco for both implementation.
You can change it into INT8 by updating the configuration directly:
import tensorflow.contrib.tensorrt as trt
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=output_names,
max_batch_size=1,
max_workspace_size_bytes=1 << 25,
<b>precision_mode='INT8',</b>
minimum_segment_size=50
)
3) For Xavier, you can also try to use DLA to offload the GPU loading.
But DLA is not enabled in the TensorFlow yet, you will need to use pure TensorRT to access it.
For the re-implementation of tf-trt I did the following:
yes I executed the script ./jetson_clocks.sh
I modified trt graph and put the value INT8 instead FP 16 but the model becomes extremely slow and I got a runtime equals to 2175ms. My expectation is to have a real-time object detection application capable of processing images in the window time [90ms to 110 ms]
Can you please elaborate on this point, how to enable the DLA using pure TensorRT framework to access it??
is there another trick that can be done in order to decrease the average runtime?
Thank you for the swift reply, I will be waiting for updates about this matter. Meanwhile, I will try to test the model with different hyperparameters and maybe play around with the architecture.
The script is using INT8 mode incorrectly.
INT8 mode in TF-TRT requires an additional calibration step. You are actually measuring the performance of the calibration graph.
The workflow for doing INT8 inference in TF-TRT for TF1.13 is as follows:
Are we just need to run the inference without doing any extra thing for Tensorflow 1.13 ? Thanks
It would be great if we could have the more detailed code for 1.13 to do the inference on calibration graph. As there is no material for Tensorflow 1.13 or below but only 1.14 , 1.15 and 2.0.