SSD-MobilenetV2 bad performance on XavierNX using Tensorflow + TF_TRT

SB_nvidia · May 19, 2021, 7:39pm

I tested the performance of Xavier NX in connection with Tensorflow, TF-TRT, OpenCV and the SSD-MobilenetV2 pretrained on the COCO dataset and was quite disappointed. I only get 10fps with the sample video attached. The GPU does not seem to be heavily loaded.

Installed Tensorflow 1.15 according to Official TensorFlow for Jetson AGX XavierNX
Installed OpenCV with CUDA support
Installed everything else according to How to configure your NVIDIA Jetson Nano for Computer Vision and Deep Learning - PyImageSearch
Created an optimized TensorRT graph
Attached: Used Scripts and the according terminal output, the Sample video and the jtop-Info Screenshot
detect_realtime_nano.py (7.4 KB)
jtop_Info
Output_detect_realtime_nano.txt (6.0 KB)
Output_prepare_trt_graph.txt (28.3 KB)
prepare_trt_graph.py (2.2 KB)

Here is demo where you can see the jetson jtop stats during the inference: iCloud

What am I doing wrong? Or can someone confirm that this is the maximum performance of the XavierNX with this framework?

AastaLLL · May 20, 2021, 3:19am

Hi,

Based on the log below:

2021-05-19 20:11:38.116810: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:486] There are 1850 ops of 29 different types in the graph that are not converted to TensorRT: Fill, Merge, Switch, Range, ConcatV2, ZerosLike, Identity, NonMaxSuppressionV3, Minimum, StridedSlice, ExpandDims, Unpack, TopKV2, Cast, Transpose, Placeholder, ResizeBilinear, Squeeze, Mul, Sub, Const, Greater, Shape, Where, Reshape, NoOp, GatherV2, AddV2, Pack, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).

There are lots of operations fallback to use TensorFlow implementation.
The data transfer cost increases if the inference is frequently switching between TensorFlow and TensorRT.

Is TensorFlow interface essential for you?
If not, it’s recommended to convert the model into TensorRT engine for an optimal performance.

In our benchmark result, pure TensorRT inference for SSD Mobilenet-V1 can reach 909 fps.
So it’s expected that you can get a much better result than using TF-TRT.

Thanks.

SB_nvidia · May 20, 2021, 4:47am

There are lots of operations fallback to use TensorFlow implementation.
why is that? I’m not doing anything special, just converting the standard mobilenet model.

Is TensorFlow interface essential for you?
If not, it’s recommended to convert the model into TensorRT engine for an optimal performance.
I thought that is what I do using TF-TRT.
I want to perform transfer learning later, using a pretrained standard model and adding additional trainable layers. As I understand I need to use a framework like TF for this, What would be the recommended way to do it?
And can you confirm that 10fps is really the maximum performance of the SSD-MobilenetV2 using Tensorflow on the XavierNX even after optimizing?

SB_nvidia · May 23, 2021, 8:12pm

would be very grateful for an answer

AastaLLL · June 3, 2021, 9:12am

Hi,

Please noted that TF-TRT uses the parser that embedded in the TensorFlow GitHub.
And the support matrix is relatively limited. Please find the details below:

It’s more recommended to separate training and inference stage.
You can deploy a model with pure TensorRT as well as training it with TensorFlow.

Since pure TensorRT can reach much better performance on SSD Mobilenet-V1.
It’s recommended to move to pure TensorRT instead.

Thanks.

Topic		Replies	Views
Jetson Xavier NX running with Cuda and TF1.5 Jetson Xavier NX tensorflow	6	1441	October 18, 2021
Lower performance with TRT than plain TF? Jetson Xavier NX tensorrt , jetson-inference	14	1956	October 18, 2021
Best practice inference of TensorFlow bbject detection models on Jetson devices Jetson Xavier NX tensorflow	4	1372	March 24, 2022
Recap on tensorflow object detection API on TX2 Jetson TX2	9	4747	October 18, 2021
TF-TRT optimization TensorRT tensorrt , tensorflow , jetson-inference	4	4948	June 2, 2021
SSD Mobilenet V2 TensorRT optimization for Jetson TX2 Jetson TX2 tensorrt	6	1857	October 18, 2021
Low GPU Usage with Tensorflow Inference on Jetson Tx2 Jetson TX2	13	4440	October 18, 2021
TF-TRT model very slow to load, with poor performance Jetson Xavier NX tensorrt	6	2171	July 21, 2021
How to convert a trained model to TensorRT for inference? Jetson AGX Xavier	8	2566	October 18, 2021
No speed up tensorrt model in inference (xavier) Jetson AGX Xavier tensorrt	4	624	October 18, 2021

SSD-MobilenetV2 bad performance on XavierNX using Tensorflow + TF_TRT

Related topics