Description
Based on the TF-TRT documentation (Accelerating Inference in TensorFlow with TensorRT User Guide - NVIDIA Docs) I converted my custom model to TensorRT. The code can be seen below. The conversion succeeded and I can run inference using my model at nearly 2x the speed. The next step would be to serialize and save the converted graph as a TRT engine to later include it in my Deepstream application. In the documentation I cannot find any information regarding this step. I don’t want to create my tensorrt engine during runtime.
Simply said, I want to produce the same output as I would get by using trtexec on a onnx model and using the saveEngine flag.
Is this possible using TF-TRT?
trtexec --onnx=/model.onnx --saveEngine= trt.engine
Code
with tf.Session() as sess:
print(f'output nodes: {output_nodes}')
with tf.gfile.GFile(frozen_graph_name, 'rb') as f:
frozen_graph = tf.GraphDef()
frozen_graph.ParseFromString(f.read())
if precision_mode is not None:
converter = trt.TrtGraphConverter(
input_graph_def=frozen_graph,
nodes_blacklist=output_nodes,
precision_mode=precision_mode,
is_dynamic_op=False)
frozen_graph = converter.convert()
# save this frozen graph to disk to reuse later
Environment
TensorRT Version: 7.1.3
GPU Type: agx xavier
CUDA Version: 10.2
CUDNN Version: 8.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 1.15
Baremetal or Container (if container which image + tag): baremetal