trt_pose onnx performance issues

First, I’d like to say how great the trt_pose git project is.

Working on a TX2 (8Gb), I am seeing inference time of around 10ms.

Weirdly when saving the model to onnx and loading it into an c++ tensorrt project (based on the mnist onnx sample) I am seeing inference time go up to 40+ms. 4 times worse

Any way to deploy the model to c++ without the performance sacrifice?

Ubuntu 18.04
Jetpack 4.2.2
TensorRT 5.1.1
CUDA 10.0.326
cuDNN 7.5.0


I would recommend instead of every time loading a ONNX model, generate a TRT serialized engine file and just load the TRT engine by deserializing it for inference.

You can also use trtexec command line tool for testing and debugging: