To be more specific, I got two files exported from the same tensorflow graph
- The model gets converted to uff format first, then i used
trt.utils.uff_to_trt_engineto create trt_engine. I was able to serve it with nvidia inference server.
- The graph file is generated with
tensorflow.contrib.tensorrt.create_inference_graph. And I am working on serving it with either nvidia inference server or tensorflow serving.
I am a little bit confused - Is there any difference between this two approaches? What are trt_engine and trt_inference_graph (and trt_plan)?