I have been trying to get faster (real-time) inference for my Mask-RCNN custom model on the Jetson Xavier. Right now, the setup takes around 15s and then each inference takes 10s per frame, which is very slow. I am informed that Deepstream can be helpful to increase the data pipeline speed and TensorRT can speed up the inference through model optimisation.
1). What is the entire procedure for getting a faster inference ?
2) What all options do I have ?
3) Regarding TensorRT, I am aware that it provides TRT-inference-engine, ONNX-parsers, UFF-parsers, etc. Is there a way to select what works for me ?
TLT has been designed to integrate with DeepStream SDK, so models trained with TLT will work out of the box with it.
To deploy a model trained by TLT to DeepStream, you have two options:
Option 1 : Integrate the .etlt model directly in the DeepStream app. The model file is generated by export.
Option 2 : Generate a device-specific optimized TensorRT engine using tlt-converter . The generated TensorRT engine file can also be ingested by DeepStream.
For this, I will need to train my Mask-RCNN model from scratch and then export the model to .tlt and .etlt. Integration with Deepstream is explained well here. 🤠
I am planning to complete these steps locally on my dGPU system using Deepstream 5.0 docker container and then migrate to Jetson. Is this recommended ?