I tested the performance of Xavier NX in connection with Tensorflow, TF-TRT, OpenCV and the SSD-MobilenetV2 pretrained on the COCO dataset and was quite disappointed. I only get 10fps with the sample video attached. The GPU does not seem to be heavily loaded.
2021-05-19 20:11:38.116810: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:486] There are 1850 ops of 29 different types in the graph that are not converted to TensorRT: Fill, Merge, Switch, Range, ConcatV2, ZerosLike, Identity, NonMaxSuppressionV3, Minimum, StridedSlice, ExpandDims, Unpack, TopKV2, Cast, Transpose, Placeholder, ResizeBilinear, Squeeze, Mul, Sub, Const, Greater, Shape, Where, Reshape, NoOp, GatherV2, AddV2, Pack, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
There are lots of operations fallback to use TensorFlow implementation.
The data transfer cost increases if the inference is frequently switching between TensorFlow and TensorRT.
Is TensorFlow interface essential for you?
If not, it’s recommended to convert the model into TensorRT engine for an optimal performance.
In our benchmark result, pure TensorRT inference for SSD Mobilenet-V1 can reach 909 fps.
So it’s expected that you can get a much better result than using TF-TRT.
There are lots of operations fallback to use TensorFlow implementation.
why is that? I’m not doing anything special, just converting the standard mobilenet model.
Is TensorFlow interface essential for you? If not, it’s recommended to convert the model into TensorRT engine for an optimal performance.
I thought that is what I do using TF-TRT.
I want to perform transfer learning later, using a pretrained standard model and adding additional trainable layers. As I understand I need to use a framework like TF for this, What would be the recommended way to do it?
And can you confirm that 10fps is really the maximum performance of the SSD-MobilenetV2 using Tensorflow on the XavierNX even after optimizing?
Please noted that TF-TRT uses the parser that embedded in the TensorFlow GitHub.
And the support matrix is relatively limited. Please find the details below:
It’s more recommended to separate training and inference stage.
You can deploy a model with pure TensorRT as well as training it with TensorFlow.
Since pure TensorRT can reach much better performance on SSD Mobilenet-V1.
It’s recommended to move to pure TensorRT instead.