Cannot run yolo on jetson agx orin

Have followed the following docs/guides

Building pytorch - PyTorch for Jetson
For ultralytics on Jetson - NVIDIA Jetson - Ultralytics YOLO Docs

Thank you for your support.

Here is my env info :

python3 -m torch.utils.collect_env
Collecting environment information…
PyTorch version: 2.1.0
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (aarch64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.19.1
Libc version: glibc-2.31

Python version: 3.8.19 (default, Mar 20 2024, 19:53:40) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.10.120-tegra-aarch64-with-glibc2.26
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_precompiled.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_graph.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_heuristic.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.7.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 3
Vendor ID: ARM
Model: 1
Model name: ARMv8 Processor rev 1 (v8l)
Stepping: r0p1
CPU max MHz: 2201.6001
CPU min MHz: 115.2000
BogoMIPS: 62.50
L1d cache: 768 KiB
L1i cache: 768 KiB
L2 cache: 3 MiB
L3 cache: 6 MiB
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, but not BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==2.1.0
[pip3] torchvision==0.16.2+c6f3977
[conda] numpy 1.23.5 pypi_0 pypi
[conda] pytorch-cuda 11.8 h8dd9ede_2 pytorch
[conda] torch 2.1.0 pypi_0 pypi

Getting this exception when trying to export a model

yolo export model=yolov8n.pt format=engine

WARNING ⚠️ TensorRT requires GPU export, automatically assigning device=0
Ultralytics 8.3.7 🚀 Python-3.8.19 torch-2.1.0 CUDA:0 (Orin, 62800MiB)
YOLOv8n summary (fused): 168 layers, 3,151,904 parameters, 0 gradients
Traceback (most recent call last):
File “/home/jacob/work/github/jpisaac/testproj/env/bin/yolo”, line 8, in
sys.exit(entrypoint())
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/ultralytics/cfg/init.py”, line 831, in entrypoint
getattr(model, mode)(**overrides) # default args from model
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/ultralytics/engine/model.py”, line 736, in export
return Exporter(overrides=args, _callbacks=self.callbacks)(model=self.model)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/torch/utils/_contextlib.py”, line 115, in decorate_context
return func(*args, **kwargs)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/ultralytics/engine/exporter.py”, line 265, in call
y = model(im) # dry runs
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/ultralytics/nn/tasks.py”, line 111, in forward
return self.predict(x, *args, **kwargs)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/ultralytics/nn/tasks.py”, line 129, in predict
return self._predict_once(x, profile, visualize, embed)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/ultralytics/nn/tasks.py”, line 150, in _predict_once
x = m(x) # run
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/ultralytics/nn/modules/conv.py”, line 54, in forward_fuse
return self.act(self.conv(x))
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/torch/nn/modules/conv.py”, line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File “/home/jacob/work/github/jpisaac/testproj/env/lib/python3.8/site-packages/torch/nn/modules/conv.py”, line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: GET was unable to find an engine to execute this computation
Sentry is attempting to send 2 pending events
Waiting up to 2 seconds
Press Ctrl-C to quit

Hi,

RuntimeError: GET was unable to find an engine to execute this computation

Based on the log error, the issue might relate to the PyTorch installation.
Could you share how you installed the package on Orin?

Thanks.

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

I tried installing PyTorch both ways

  1. Using the pre built wheels from here for Jetpack 5.1 (torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl)

  2. Also built PyTorch from the source. (branch v2.1.0)
    Both of them give the same error.

My power mode is already at MAXN (0), will try setting the jetson_clocks. Not sure if that will help since I am not able to run the very basic yolo commands.

Ran another test for cuDNN as explained here -

This is the failure I get -

jacob@jacob-desktop:~/work/tmp/cudnn-8.7/cudnn_samples_v8/mnistCUDNN$ ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 8700 , CUDNN_VERSION from cudnn.h : 8700 (8.7.0)
Host compiler version : GCC 9.4.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 16 Capabilities 8.7, SmClock 1300.0 Mhz, MemSize (Mb) 62800, MemClock 1300.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_EXECUTION_FAILED for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_EXECUTION_FAILED for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_EXECUTION_FAILED for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_EXECUTION_FAILED for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_EXECUTION_FAILED for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_EXECUTION_FAILED for Algo 7: -1.000000 time requiring 2057744 memory
ERROR: cudnn failure (CUDNN_STATUS_EXECUTION_FAILED) in mnistCUDNN.cpp:625
Aborting…

Hopefully this should help an expert understand what’s wrong with my setup?

Hi,

It looks like there are two cuDNNs in your device (v8.7.0 and 9.4.0).
Could you use the default JetPack cuDNN and remove others?

cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_precompiled.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_graph.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_heuristic.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.7.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.7.0

Thanks.

Currently, in the process of upgrading to Jetpack 6.1, Will post here if I am successful with that transition. Thanks for your support.

Using Jetpack 6.1 now and can successfully run torch and cudnn samples

python3 -m torch.utils.collect_env

/usr/lib/python3.10/runpy.py:126: RuntimeWarning: ‘torch.utils.collect_env’ found in sys.modules after import of package ‘torch.utils’, but prior to execution of ‘torch.utils.collect_env’; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
Collecting environment information…
PyTorch version: 2.5.0a0+872d972e41.nv24.08
Is debug build: False
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.148-tegra-aarch64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.6.77
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Orin (nvgpu)
Nvidia driver version: 540.4.0
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.9.5.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv.so.9.5.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn.so.9.5.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_precompiled.so.9.5.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.5.0
/usr/lib/aarch64-linux-gnu/libcudnn_graph.so.9.5.0
/usr/lib/aarch64-linux-gnu/libcudnn_heuristic.so.9.5.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops.so.9.5.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: ARM
Model name: Cortex-A78AE
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 3
Stepping: r0p1
CPU max MHz: 2201.6001
CPU min MHz: 115.2000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm paca pacg
L1d cache: 768 KiB (12 instances)
L1i cache: 768 KiB (12 instances)
L2 cache: 3 MiB (12 instances)
L3 cache: 6 MiB (3 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, but not BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.23.5
[pip3] onnx==1.17.0
[pip3] onnxruntime==1.15.1
[pip3] onnxruntime-gpu==1.19.0
[pip3] onnxslim==0.1.35
[pip3] torch==2.5.0a0+872d972e41.nv24.8
[pip3] torchvision==0.20.0
[conda] Could not collect

Hi,

Good to know this.
Thanks for the update.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.