We have a Jetson Xavier NX where we want to run Object Detection Models.
My main question is what is the best practice to run “bigger” object detection models at 30 FPS and above.
- Specifically with what framework is it recommended to train models for easiest conversion into the recommended final model format used for inference?
- What kind of models are recommended which are rather “big” like ResNet50_640x640 and still yield 30 FPS or more?
- What final model format is most recommended for Inference on Jetson Boards?
- Is there anything “hidden” to be considered using jetson xavier NX to its full potential?
I figured this shouldnt be to hard looking at the benchmarks from this link: Jetson Benchmark
But ran into a lot of problems trying to that the following way
Since I already know Tensorflow, I trained the following 3 Models from Model Zoo on TensorFlow2:
[SSD ResNet50 V1 FPN 640x640 (RetinaNet50)]
[SSD MobileNet V2 FPNLite 320x320]
[CenterNet Resnet50 V1 FPN 512x512]
For inference on Jetson Devices I read that TensorRT engines would be the way to go for maximum FPS. Therefore I tried to convert them with TF-TRT. With ResNet50 the conversion never worked because of OOM Errors despite me increasing the swap memory to 16GB. I tried to limit the TensorFlow Memory values between 500MB and 4GB and the “max_workspace_size_bytes” to values between 50MB and 2 GB.
For MobileNetV2 the conversion didnt work properly because there wasnt enough Memory for the “Tactics” it wanted to build.
Following: TF-TRT Documentation
Then I tried it with pure TensorRT by converting the TensorFlow models from SavedModel format to ONNX via tf2toONNX since this is the more recommended way. On the ONNX model I ran constant folding using polygraphy and then tried to convert it to tensorRT where I ran into the following error. Unsupported ONNX data type: UINT8. I understand I would have to take the input layer away with the ONNX Graphsurgeon and then try again.
Following: TensorRT Documentation
At this point I just tried to run the models in the saved model format with TensorFlow to see if the models can even be loaded on the Jetson Xavier NX. Even with the smallest model MobileNetV2 I only achieved 12 FPS even though the Jetson Xavier NX has MobileNetV1_300x300 Benchmarked with 909 FPS. This was obviously under perfect conditions with using a int8 quantized TensorRT engine, but nevertheless the difference in FPS here is huge.
I also tried to convert the models into the quantized tflite format for better inference but got stuck on the way there with a segmentation fault.
Inferencing ONNX models I got worse FPS than with the SavedModel Format using TensorFlow. So in a last effort i tried to inference multiple images at once by passing a batch of 4 images in one array and editing the ONNX model input shape to [4,320,320,3] instead of [1,320,320,3] and also with the outputs, using the onnx.tools.update_model_dims.update_inputs_outputs_dims function. This function strangely doesnt allow to change a already set value. That is the batch_size of 1 in my case.