I am trying to implement a keras model on TX2 and wanted to reduce the inference time using tensorRT. So I first converted keras model into .pb model using
https://github.com/amir-abdi/keras_to_tensorflow
Now, I run the following code to optimize the model using tensorRT:
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=output_names,
max_batch_size=1,
max_workspace_size_bytes=1<<31,
precision_mode='FP16',
#output_saved_model_dir = '/home/nvidia/Desktop/alexnet/alexnet/'
)
The inference time is slower than the keras model. What should I do to improve the inference time?