I am trying to deploy various TensorFlow models (Object Detection, DeepLab) with TensorFlow C++ on the Drive PX2. Performance when deployed with TensorFlow is much slower (almost 4x as slow) than a similar setup on an x86 Linux system with a GTX1060.
Running the TensorRT samples gives good results so I assume that there are some issues with the way TensorFlow is managing the gpu processes. Since we only have TensorRT 4 on the PX2, it seems like these models are not easily converted to uff for deployment with TensorRT C++, if possible at all, which is why I am still trying to work with TensorFlow.
Yes, I have compiled TensorFlow according to that post.
That is not preferable but will be my last resort.
I have actually already done this. The difference in inference speed between optimized and non-optimized models is similar to that on an x86 Linux setup, so the issue likely lies with the layers in TensorFlow. It is strange that that are no problems with compiling and running TensorFlow, up until the slow inference speed when actually deploying a model.
Hopefully someone else has come across this issue and managed to discover the underlying cause. Thanks for your suggestions regardless.
Based on the experiment no.2, most of your layers may not be supported by the TensorRT and fallback into the TensorFlow implementation.
Would you mind to check our support matrix for your model first:
[url]Support Matrix :: NVIDIA Deep Learning TensorRT Documentation