To be more specific, I got two files exported from the same tensorflow graph
The model gets converted to uff format first, then i used trt.utils.uff_to_trt_engine to create trt_engine. I was able to serve it with nvidia inference server.
The graph file is generated with tensorflow.contrib.tensorrt.create_inference_graph. And I am working on serving it with either nvidia inference server or tensorflow serving.
I am a little bit confused - Is there any difference between this two approaches? What are trt_engine and trt_inference_graph (and trt_plan)?
Engine/PLAN (via uff_to_trt_engine) is the serialized data that the runtime engine uses to execute the network. The implied workflow is taking a saved neural network and parse it from its saved format into TensorRT UFF parser. It’s a TensorFlow API.
Graph file (via create_inference_graph) implies TensorFlow integration. create_inference_graph is a TensorFlow API. And it doesn’t actually use UFF.
Thanks for the quick reply!
So is it possible to deploy a graph file with nvidia inference server or tensorflow serving? Is there anything that I can refer to?