I’m trying to build onnxruntime with tensorrt support on my jetson agx xavier with jetpack v4.6. I’m following instructions off of this page Build with different EPs | onnxruntime but my build fails. The most common error is:
onnxruntime/gsl/gsl-lite.hpp(1959): warning: calling a host function from a hostdevice function is not allowed
I’ve tried with the latest CMAKE version 3.22.1, and version 3.21.1 as mentioned on the website.
The end goal of this build is to create a .whl binary to then use as part of the installation process of another program in a docker container. Any help and insight is appreciated, thank you!
I started off there, and found the link I included in my first post on that page too, under “Build from Source”. The prebuild wheels work fine, but they do not include the tensorrt backend. I’m trying to build onnxruntime such that it includes the tensorrt backend. Has anyone else tried or achieved this? Does anything from the attached build logfile stand out?
Thanks.
We just double-check the wheel package shared on the eLinux page.
With v1.10.0+JetPack4.6, we can run ONNXRuntime with TensorrtExecutionProvider successfully.
Would you mind giving it a try?
$ python3
Python 3.6.9 (default, Dec 8 2021, 21:08:43)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import onnxruntime as ort
>>> sess = ort.InferenceSession('/usr/src/tensorrt/data/mnist/mnist.onnx', providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider'])
2022-01-25 03:08:29.992372812 [W:onnxruntime:Default, tensorrt_execution_provider.h:53 log] [2022-01-25 08:08:29 WARNING] /home/onnxruntime/onnxruntime-py36/cmake/external/onnx-tensorrt/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2022-01-25 03:08:31.460957667 [W:onnxruntime:Default, tensorrt_execution_provider.h:53 log] [2022-01-25 08:08:31 WARNING] Detected invalid timing cache, setup a local cache instead
>>>
This helped! I was using wheel for ort v1.8.0. The latest v1.10.0 wheel for Jetsons seems to include the tensorrt provider out of the box. Thank you!
I was expecting a speed-up from using TRT with my models. Instead I’m seeing a significant (15-20x) slowdown. What am I missing? (Please let me know if I should make a new post for this question continuation).
The following runs show the seconds it took to run an inception_v3 and inception_v4 model on 100 images using CUDAExecutionProvider and TensorrtExecutionProvider respectively. The models were trained and converted to onnx using pytorch on a different computer. The runs are executed through docker on the Jetson AGX device in MAXN mode.
Using JTop I can see that with CUDAExecutionProvider the GPU is always fully engaged, and with TensorrtExecutionProvider the GPU is intermittently engaged, like it’s sputtering.
inception_v3 inception_v4
CUDA 11s 16s
TRT 223s 257s
So the best speed I’m getting is ~9img/sec. Shouldn’t I be able to crank out more frames per seconds?