Deployment of models with TensorFlow is much slower on Drive PX2

I am trying to deploy various TensorFlow models (Object Detection, DeepLab) with TensorFlow C++ on the Drive PX2. Performance when deployed with TensorFlow is much slower (almost 4x as slow) than a similar setup on an x86 Linux system with a GTX1060.

Running the TensorRT samples gives good results so I assume that there are some issues with the way TensorFlow is managing the gpu processes. Since we only have TensorRT 4 on the PX2, it seems like these models are not easily converted to uff for deployment with TensorRT C++, if possible at all, which is why I am still trying to work with TensorFlow.

Will greatly appreciate any advice. Thanks.

Hi,

First, have you compiled TensorFlow package with PX2 GPU architecture, which are 6.1 and 6.2?
Here is a topic of compiling TensorFlow from source for your reference:
https://devtalk.nvidia.com/default/topic/1049100/general/tensorflow-installation-on-drive-px2-/post/5324624/#5324624

Suppose you are meeting some non-supported operations. Here are two possible solutions for you:

1. Implement it with TensorRT plugin layer.
Here is some example to demonstrate the plugin API:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#extending

2. Try TF-TRT.
TF-TRT will try to convert the model into TenorRT and automatically fallback the non-supported operations into TensorFlow.
Check this tutorial for more information: https://github.com/NVIDIA-AI-IOT/tf_trt_models

Thanks.

Hi,

Yes, I have compiled TensorFlow according to that post.

  1. That is not preferable but will be my last resort.

  2. I have actually already done this. The difference in inference speed between optimized and non-optimized models is similar to that on an x86 Linux setup, so the issue likely lies with the layers in TensorFlow. It is strange that that are no problems with compiling and running TensorFlow, up until the slow inference speed when actually deploying a model.

Hopefully someone else has come across this issue and managed to discover the underlying cause. Thanks for your suggestions regardless.

Hi,

Sorry for the late update.

Based on the experiment no.2, most of your layers may not be supported by the TensorRT and fallback into the TensorFlow implementation.
Would you mind to check our support matrix for your model first:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html

Thanks.