Unable to use TensorRTExecution Provider on Jetson AGX Xavier

I have Onnxruntime-GPU 1.16.0, l4t-tensort-8.5.2.2, and cuda-11.4

I am on a Jetson AGX Xavier trying to decrease the inference time of an onnx model by using a gpu. However the model performs significantly worse on the GPU when using CUDAExecutionProvider.

nxrun.InferenceSession(onnx_model_path, sess_options=so, providers=[(“TensorrtExecutionProvider”, {“trt_fp16_enable”: True}), (“CUDAExecutionProvider”, {“cudnn_conv_algo_search”: “DEFAULT”})])

I have tried the following:

  1. going from fp32 to fp16 even amp. (https://developer.nvidia.com/blog/end-to-end-ai-for-nvidia-based-pcs-optimizing-ai-by-transitioning-from-fp32-to-fp16/)
  2. Tried different flags with CUDAExecutionProvider.
  3. Tried multiple NVIDIA blogs to try and decrease the inference time by converting to a .plan file.

I am trying to use TensorRTExecutionProvider but am unable to load my model when I try this session option. The error I run into is:

jetsonserver-1 | 2024-03-14 20:39:12.224268543 [I:onnxruntime:Default, tensorrt_execution_provider_utils.h:520 TRTGenerateId] [TensorRT EP] Model name is model.onnx
jetsonserver-1 | 2024-03-14 20:39:14.826604646 [W:onnxruntime:Default, tensorrt_execution_provider.h:77 log] [2024-03-14 20:39:14 WARNING] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
jetsonserver-1 | 2024-03-14 20:39:15.086108633 [I:onnxruntime:Default, tensorrt_execution_provider.cc:1392 GetSubGraph] [TensorRT EP] TensorRT subgraph MetaDef name TRTKernel_graph_torch_jit_15612526831647503026_0**
jetsonserver-1 | 2024-03-14 20:39:15.086976803 [I:onnxruntime:Default, tensorrt_execution_provider.cc:1392 GetSubGraph] [TensorRT EP] TensorRT subgraph MetaDef name TRTKernel_graph_torch_jit_15612526831647503026_0**
jetsonserver-1 | 2024-03-14 20:39:15.087127754 [I:onnxruntime:Default, tensorrt_execution_provider.cc:1884 GetCapability] [TensorRT EP] Whole graph will run on TensorRT execution provider**
jetsonserver-1 | 2024-03-14 20:39:16.117593029 [W:onnxruntime:Default, tensorrt_execution_provider.cc:2173 Compile] [TensorRT EP] Builder optimization level can only be used on TRT 8.6 onwards!**

However TR8.6 requires cuda11.8 which my jetson agx xavier doesn’t have/can’t support (?). I am using Jetpack SDK 5.1.2

Guidance for the following would be much appreciated:
1) How can I use TensorrtExecutionProvider on my docker container.
2) Decrease inference time

Maybe try upgrading your cuda 11.8?

during apt-get install cuda, be careful to mention the version as well.

Hi,

It looks like your package is not compatible with the JetPack environment.
How do you install onnxruntime?

You can find the wheel for JetPack users in our eLinux wiki below:
https://elinux.org/Jetson_Zoo#ONNX_Runtime

Thanks.

I install ONNXRUNTIME using that Jetson Zoo link you provided.

And was using version 1.16.0 for python 3.8

Bump.

My Dockerfile is defined as below. My model fails to load and I am unable to run inference on it.

so = nxrun.SessionOptions()
so.intra_op_num_threads = 4
so.log_severity_level = 3
nxrun.InferenceSession(onnx_model_path, sess_options=so, providers=[“TensorrtExecutionProvider”, (“CUDAExecutionProvider”, {“cudnn_conv_algo_search”: “DEFAULT”})])

ERROR: Gets stuck on 2024-03-22 14:22:43.238667386 [I:onnxruntime:Default, tensorrt_execution_provider.cc:1884 GetCapability] [TensorRT EP] Whole graph will run on TensorRT execution provider

If I use this instead (Added trt_fp_16_enable):

nxrun.InferenceSession(onnx_model_path, sess_options=so, providers=[(“TensorrtExecutionProvider”, {“trt_fp16_enable”: True}), (“CUDAExecutionProvider”, {“cudnn_conv_algo_search”: “DEFAULT”})])

I get stuck where I originally showed.

Any guidance is greatly appreciated.

Dockerfile.txt (3.0 KB)

Is this still an issue to support? Any result can be shared?

Hi,

It’s more recommended to use TensorRT for inference since it is optimized for both memory and performance.
You can run it with /usr/src/tensorrt/bin/trtexec --onnx=[model].

Please let us know if onnxruntime is more preferred.
Thanks.

Hi @kayccc and @AastaLLL thank you for the replies. I did use trtexec but it seemed as though I was only able to do multiple runs of random input data. Is there a way to incorporate it into my flask website that is run on a docker container?

Hi,

You can build your container on the top of l4t-jetpack or l4t-tensorrt:

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.