Realtime Object Detection Demo on AGX Xavier

Description

Hi everyone,

I’d like to share our new work CenterNet-HarDNet85 which achieves 42.5 COCO mAP and nearly realtime on AGX Xavier (with JetPack 4.3, MAX-N mode). The repo is HERE.

We have converted the PyTorch model into TRT through torch2trt with FP16 mode and the current FPS is around 21 FPS (512x512, network inference time only). We wanted to know is there anything else that we can do to further improve the inference speed? Please give us some advice and you are also very welcome to contribute to this repo.

Also, we have encountered some issues on JetPack 4.4 while converting trt model. If anyone knows how to solve it, please also share with us. Thank you very much!

Environment

TensorRT Version: JetPack 4.3
GPU Type: NVIDIA AGX Xavier
Nvidia Driver Version: JetPack 4.3
CUDA Version: JetPack 4.3
CUDNN Version: JetPack 4.3
Operating System + Version: JetPack 4.3
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.5.0
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered
1 Like

Hi @kennyp875l3kab,

This is surely of great help!
Also, please check for the best practices to be followed, to improve the performance of your engine.
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-700/tensorrt-best-practices/index.html#optimize-performance

For issues related to JetPack, Jetson forum will be able to assist you better.
Thanks!

Hi guys, I just wanted to share that the issue has been solved. It was the GatherElements operator that was generated by the pytorch onnx export procedure for torch.gather() function, which is not supported by TensorRT. I follow the solution posted Here, then the onnx exported by pytorch is able to be converted to TRT now. The whole model including bbox decoding is now converted to TRT so the speed is a little bit faster than before.
The 512x512 FP16 model now achieves 43.2 COCO mAP @ 21 fps on Xavier. Feel free to contribute to the repo if you have any idea to make it faster. Thank you very much!

2 Likes

Wonderful! Thanks for the code so much