I have Onnxruntime-GPU 1.16.0, l4t-tensort-, and cuda-11.4
I am on a Jetson AGX Xavier trying to decrease the inference time of an onnx model by using a gpu. However the model performs significantly worse on the GPU when using CUDAExecutionProvider.
nxrun.InferenceSession(onnx_model_path, sess_options=so, providers=[(“TensorrtExecutionProvider”, {“trt_fp16_enable”: True}), (“CUDAExecutionProvider”, {“cudnn_conv_algo_search”: “DEFAULT”})])
I have tried the following:
- going from fp32 to fp16 even amp. (https://developer.nvidia.com/blog/end-to-end-ai-for-nvidia-based-pcs-optimizing-ai-by-transitioning-from-fp32-to-fp16/)
- Tried different flags with CUDAExecutionProvider.
- Tried multiple NVIDIA blogs to try and decrease the inference time by converting to a .plan file.
I am trying to use TensorRTExecutionProvider but am unable to load my model when I try this session option. The error I run into is:
jetsonserver-1 | 2024-03-14 20:39:12.224268543 [I:onnxruntime:Default, tensorrt_execution_provider_utils.h:520 TRTGenerateId] [TensorRT EP] Model name is model.onnx
jetsonserver-1 | 2024-03-14 20:39:14.826604646 [W:onnxruntime:Default, tensorrt_execution_provider.h:77 log] [2024-03-14 20:39:14 WARNING] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
jetsonserver-1 | 2024-03-14 20:39:15.086108633 [I:onnxruntime:Default, tensorrt_execution_provider.cc:1392 GetSubGraph] [TensorRT EP] TensorRT subgraph MetaDef name TRTKernel_graph_torch_jit_15612526831647503026_0**
jetsonserver-1 | 2024-03-14 20:39:15.086976803 [I:onnxruntime:Default, tensorrt_execution_provider.cc:1392 GetSubGraph] [TensorRT EP] TensorRT subgraph MetaDef name TRTKernel_graph_torch_jit_15612526831647503026_0**
jetsonserver-1 | 2024-03-14 20:39:15.087127754 [I:onnxruntime:Default, tensorrt_execution_provider.cc:1884 GetCapability] [TensorRT EP] Whole graph will run on TensorRT execution provider**
jetsonserver-1 | 2024-03-14 20:39:16.117593029 [W:onnxruntime:Default, tensorrt_execution_provider.cc:2173 Compile] [TensorRT EP] Builder optimization level can only be used on TRT 8.6 onwards!**
However TR8.6 requires cuda11.8 which my jetson agx xavier doesn’t have/can’t support (?). I am using Jetpack SDK 5.1.2
Guidance for the following would be much appreciated:
1) How can I use TensorrtExecutionProvider on my docker container.
2) Decrease inference time