Performance DECREASE with tensorRT under onnxruntime

Hi All,

I’m working on putting an onnx format image classifier NN model (inception) on a Jetson Xavier AGX. I’ve gotten it to work with onnxruntime in a docker container with CUDAExecutionProvider and TensorrtExecutionProvider providers.

I was expecting a speed-up from using TensorRT with my models. Instead I’m seeing a significant (15-20x) slowdown. What am I missing?

The following runs show the seconds it took to run an inception_v3 and inception_v4 model on 100 images using CUDAExecutionProvider and TensorrtExecutionProvider respectively. The models were trained and converted to onnx using pytorch on a different computer. The runs are executed through docker on the Jetson AGX device in MAXN mode.
Using JTop I can see that with CUDAExecutionProvider the GPU is always fully engaged, and with TensorrtExecutionProvider the GPU is intermittently engaged, like it’s sputtering.

      inception_v3  inception_v4
CUDA           11s           16s
TRT           223s          257s

So the best speed I’m getting is ~9img/sec. Shouldn’t I be able to crank out more frames per seconds?

If there’s content you need to get into the specifics, let me know!
Thanks for your help!

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.


Sorry for the late update and thanks for opening a new topic.

We want to reproduce this issue internally.
Would you mind sharing the ONNX model and a simple script to reproduce the CUDA and TensorRT results?


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.