How to perform inference using a serialized TensorRT engine (*.plan) on Jetson Nano?

I have a serialized TensorRT engine (*.plan) file that I’ve created from a pre-trained PyTorch/Retinanet model that I’ve further trained (fine-tuned) using a custom dataset as input. The model and code I’ve based this upon is provided by NVIDIA here.

This NVIDIA RetinaNet model is intended to be run within the NVIDIA PyTorch Docker container. However, this isn’t usable on a Jetson Nano since nvidia-docker is not supported yet on ARM64. So in order to use this model on Jetson Nano I need to perform inference using the TensorRT engine outside of the context of the Docker container. I’ve not yet found documentation that clearly explains how I would do this. (For example this guide is about as clear as mud to a rookie like me.)

My goal is to read image frames from a video stream and use the model to perform inference on each frame for object detection. I have this working as planned on a laptop using the fine-tuned PyTorch-RetinaNet (*.pth) model, and the TensorRT on Jetson Nano is my next frontier.

Thanks in advance for any comments or suggestions.

Hi, did you end up succeeding with using a TensorRT engine file to perform inference on multiple frames?

Not yet, as I’ve not yet worked out how to convert the various models that I’ve trained on my custom dataset into a TensorRT engine file. These have typically been PyTorch or Keras/TensorFlow-based models, and (it seems that) TensorRT works easier with Caffe-based models. How I get TensorRT engine files corresponding to my custom trained models that I can then leverage using the DeepStream SDK is still unclear. Is there a simple recipe for mortals? This seems to be poorly documented, and/or I’ve just not yet managed to find a good guide that I can follow to the end for my situation.