TensorRT with Tensorflow models - what are the options?


How can I run some tensorflow models accelerated by TensorRT? My understanding is you can use uff parser to load frozen tensorflow model and then perform inference. The other way would be to build tensorflow-gpu with tensorrt support and run inference that way (I think some refer to it as tftrt, right). Is that correct? If that is indeed true what are the recommendations to get the best performance? Is maybe one of these ways faster than the other one? Or maybe does this “intermediate” uff representation slow things down a bit?


TF-TRT converts the subgraph to TensorRT nodes as much as possible. We are still working on enhancing its capabilities and making more layers compatible. It is recommended to use native TensorRT to build your model if you want to add any customized layers that is not supported by TF-TRT.