Build ONNXInference-gpu wheel for Jetpack5 with Cuda and TRT

mattia · July 26, 2022, 4:42pm

Description

I’ve been trying to run a model with onnxruntime-gpu on a Jetson AGX Orin Developer Kit using Jetpack 5.0.1, I’ve followed the guide found on “faxu dot github dot io slash onnxinference” (sorry cant post link due to being a new account) to build onnxinference from source with cuda and tensorrt support.

This is the build command i used:

./build.sh --config Release --update --build --parallel --build_wheel
–use_tensorrt --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu
–tensorrt_home /usr/lib/aarch64-linux-gnu

When I try to run the model using onnxinference, trt works fine, but Cuda does not.

Environment

Device: Jetson AGX Orin Developer Kit
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
Container: l4t-ml:r34.1.1-py3
CUDA Version: 11.4
TensorRT Version: 8.4

Relevant Files

onnxruntime_gpu-1.13.0-cp38-cp38-linux_aarch64.whl (24.9 MB)

Steps To Reproduce

On my Jetson AGX Orin with Jetpack5 installed, I launch a docker container with this command:

docker run -it --rm --runtime nvidia --network host -v test:/opt/test l4t-ml:r34.1.1-py3

Here is a snippet of my code that I run in the notebook in the docker:


#install onnxinference-gpu wheel I built, you can find it attatched to this post

!pip install onnxruntime_gpu-1.13.0-cp38-cp38-linux_aarch64.whl

import onnxruntime as ort
import numpy as np

providers = [
    ('TensorrtExecutionProvider', {
        'trt_fp16_enable': True,
    }),
    ('CUDAExecutionProvider', {
        'device_id': 0,
        'arena_extend_strategy': 'kNextPowerOfTwo',
        'gpu_mem_limit': 2 * 1024 * 1024 * 1024,
        'cudnn_conv_algo_search': 'EXHAUSTIVE',
        'do_copy_in_default_stream': True,
    })
]

img = np.zeros((3, 640, 640)).astype(np.fp32)
session_trt = ort.InferenceSession("my_onnx_model.onnx", providers=providers)
ort_inputs = {session_trt.get_inputs()[0].name: image[None, :, :, :]}
out = session_trt.run(None, ort_inputs)

and I get this exception:

022-07-26 16:16:12.161594665 [E:onnxruntime:, sequential_executor.cc:368 Execute] Non-zero status code returned while running Sigmoid node. Name:'Sigmoid_36' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device

---------------------------------------------------------------------------
Fail                                      Traceback (most recent call last)
Input In [16], in <cell line: 3>()
      1 session_trt = ort.InferenceSession("my_onnx_model.onnx", providers=providers)
      2 ort_inputs = {session_trt.get_inputs()[0].name: image[None, :, :, :]}
----> 3 out = session_trt.run(None, ort_inputs)

File /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:200, in Session.run(self, output_names, input_feed, run_options)
    198     output_names = [output.name for output in self._outputs_meta]
    199 try:
--> 200     return self._sess.run(output_names, input_feed, run_options)
    201 except C.EPFail as err:
    202     if self._enable_fallback:

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Sigmoid node. Name:'Sigmoid_36' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device

I run the same Code on a Jetson Nano, with Jetpack4.6 and onnxinference-gpu 1.11 downloaded from Jetson_Zoo#ONNX_Runtime (sorry cant post link since I’m a new user) and everything works fine.

I’d like to be able to use Cuda runtime Environment to test some more onnx models and the performances with different runtime environments on my Jetson Orin.

If I try to run my pytorch model using cuda, it works fine, torch.cuda.is_available() returns True.

Any idea why cuda Env is not working in onnxinference?

spolisetty · July 27, 2022, 9:01am

Moving this post to Jetson AGX Orin forum to get better help.

AastaLLL · July 28, 2022, 3:30am

Hi,

CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device

We got a topic that reports a similar error previously.

The root cause is that Orin’s GPU architecture (sm=87) is not added to the ONNXRuntime source by default.
So you will need to add the configuration manually.

Would you mind checking if the compiling instructions below can fix your issue or not?

Thanks.

mattia · July 28, 2022, 8:50am

Hello AastaLLL,
thank you for your answer, I can confirm that adding

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_87,code=sm_87") # AGX Orin

in the CmakeList.txt and rebuilding the wheel from source fixed the problem on my Jetson Orin!

Thanks a lot!

cssdcc1997 · August 2, 2022, 3:23am

I have difficulty compiling onnxruntime-gpu, can you provide the .wheel file compiled in Jetson Orin? Thanks a lot

kayccc · August 10, 2022, 2:40am

Hi cssdcc1997,

Please help to open a new topic if it’s still an issue. Thanks

Topic		Replies	Views
Issue using Onnxruntime with CUDAExecutionProvider on Orin Jetson AGX Orin onnx	6	6548	July 5, 2022
Can't install onnxruntime-training on JetPack 5.1.2 (Jetson Orin) Jetson AGX Orin python , ai-training , onnx , installation	5	1346	May 7, 2024
Unable to use TensorRTExecution Provider on Jetson AGX Xavier Jetson AGX Xavier tensorrt	9	603	April 18, 2024
Unable to run ONNX runtime with TensorRT execution provider on docker based on NVidia image CUDA Setup and Installation	4	7594	June 22, 2022
Onnx runtime GPU Jetson Orin Nano onnx	6	214	March 24, 2025
Onnxruntime error Jetson Nano cuda , pytorch , onnx	9	6877	October 10, 2021
Python examples for Nvidia Drive Jetson Orin NX tensorrt , python , onnx , driveos	4	47	February 26, 2025
How to use tensorrt in python of AGX Xavier JetPack5.1 Jetson AGX Xavier tensorrt , jetson-inference , python	8	2243	April 5, 2023
Trouble building onnxruntime with tensorrt Jetson AGX Xavier tensorrt , jetson-inference	7	1710	February 11, 2022
Error while using Onnxruntime gpu in windows TensorRT cuda , cudnn	1	80	January 31, 2025

Build ONNXInference-gpu wheel for Jetpack5 with Cuda and TRT

Description

Environment

Relevant Files

Steps To Reproduce

Related topics