Build ONNXInference-gpu wheel for Jetpack5 with Cuda and TRT

Description

I’ve been trying to run a model with onnxruntime-gpu on a Jetson AGX Orin Developer Kit using Jetpack 5.0.1, I’ve followed the guide found on “faxu dot github dot io slash onnxinference” (sorry cant post link due to being a new account) to build onnxinference from source with cuda and tensorrt support.

This is the build command i used:

./build.sh --config Release --update --build --parallel --build_wheel
–use_tensorrt --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu
–tensorrt_home /usr/lib/aarch64-linux-gnu

When I try to run the model using onnxinference, trt works fine, but Cuda does not.

Environment

Device: Jetson AGX Orin Developer Kit
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
Container: l4t-ml:r34.1.1-py3
CUDA Version: 11.4
TensorRT Version: 8.4

Relevant Files

onnxruntime_gpu-1.13.0-cp38-cp38-linux_aarch64.whl (24.9 MB)

Steps To Reproduce

On my Jetson AGX Orin with Jetpack5 installed, I launch a docker container with this command:

docker run -it --rm --runtime nvidia --network host -v test:/opt/test l4t-ml:r34.1.1-py3

Here is a snippet of my code that I run in the notebook in the docker:


#install onnxinference-gpu wheel I built, you can find it attatched to this post

!pip install onnxruntime_gpu-1.13.0-cp38-cp38-linux_aarch64.whl

import onnxruntime as ort
import numpy as np

providers = [
    ('TensorrtExecutionProvider', {
        'trt_fp16_enable': True,
    }),
    ('CUDAExecutionProvider', {
        'device_id': 0,
        'arena_extend_strategy': 'kNextPowerOfTwo',
        'gpu_mem_limit': 2 * 1024 * 1024 * 1024,
        'cudnn_conv_algo_search': 'EXHAUSTIVE',
        'do_copy_in_default_stream': True,
    })
]

img = np.zeros((3, 640, 640)).astype(np.fp32)
session_trt = ort.InferenceSession("my_onnx_model.onnx", providers=providers)
ort_inputs = {session_trt.get_inputs()[0].name: image[None, :, :, :]}
out = session_trt.run(None, ort_inputs)

and I get this exception:

022-07-26 16:16:12.161594665 [E:onnxruntime:, sequential_executor.cc:368 Execute] Non-zero status code returned while running Sigmoid node. Name:'Sigmoid_36' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device

---------------------------------------------------------------------------
Fail                                      Traceback (most recent call last)
Input In [16], in <cell line: 3>()
      1 session_trt = ort.InferenceSession("my_onnx_model.onnx", providers=providers)
      2 ort_inputs = {session_trt.get_inputs()[0].name: image[None, :, :, :]}
----> 3 out = session_trt.run(None, ort_inputs)

File /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py:200, in Session.run(self, output_names, input_feed, run_options)
    198     output_names = [output.name for output in self._outputs_meta]
    199 try:
--> 200     return self._sess.run(output_names, input_feed, run_options)
    201 except C.EPFail as err:
    202     if self._enable_fallback:

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Sigmoid node. Name:'Sigmoid_36' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device

I run the same Code on a Jetson Nano, with Jetpack4.6 and onnxinference-gpu 1.11 downloaded from Jetson_Zoo#ONNX_Runtime (sorry cant post link since I’m a new user) and everything works fine.

I’d like to be able to use Cuda runtime Environment to test some more onnx models and the performances with different runtime environments on my Jetson Orin.

If I try to run my pytorch model using cuda, it works fine, torch.cuda.is_available() returns True.

Any idea why cuda Env is not working in onnxinference?

1 Like

Moving this post to Jetson AGX Orin forum to get better help.

Hi,

CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device

We got a topic that reports a similar error previously.

The root cause is that Orin’s GPU architecture (sm=87) is not added to the ONNXRuntime source by default.
So you will need to add the configuration manually.

Would you mind checking if the compiling instructions below can fix your issue or not?

Thanks.

Hello AastaLLL,
thank you for your answer, I can confirm that adding

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_87,code=sm_87") # AGX Orin

in the CmakeList.txt and rebuilding the wheel from source fixed the problem on my Jetson Orin!

Thanks a lot!

I have difficulty compiling onnxruntime-gpu, can you provide the .wheel file compiled in Jetson Orin? Thanks a lot

Hi cssdcc1997,

Please help to open a new topic if it’s still an issue. Thanks