I’m working on putting an onnx format image classifier NN model (inception) on a Jetson Xavier AGX. I’ve gotten it to work with onnxruntime in a docker container with
I was expecting a speed-up from using TensorRT with my models. Instead I’m seeing a significant (15-20x) slowdown. What am I missing?
The following runs show the seconds it took to run an inception_v3 and inception_v4 model on 100 images using CUDAExecutionProvider and TensorrtExecutionProvider respectively. The models were trained and converted to onnx using pytorch on a different computer. The runs are executed through docker on the Jetson AGX device in MAXN mode.
Using JTop I can see that with CUDAExecutionProvider the GPU is always fully engaged, and with TensorrtExecutionProvider the GPU is intermittently engaged, like it’s sputtering.
inception_v3 inception_v4 CUDA 11s 16s TRT 223s 257s
So the best speed I’m getting is ~9img/sec. Shouldn’t I be able to crank out more frames per seconds?
If there’s content you need to get into the specifics, let me know!
Thanks for your help!