Need help with slower inference using YOLOv8 on NVIDIA Orin Nano 4GB

Hello everyone,

I hope this message finds you well. I am currently working on a project that involves dynamically counting objects using computer vision. Initially, I used the NVIDIA Jetson Nano DevKit with YOLOv8 for object detection, and the inference speed was acceptable .

However, I recently switched to the NVIDIA Orin Nano 4GB, which is supposed to be more powerful than the Jetson Nano. Surprisingly, the inference speed using YOLOv8 on the Orin Nano is significantly slower than on the Jetson Nano. I’m puzzled by this unexpected behavior and would greatly appreciate any insights or suggestions.

Here are some details about my setup:

  • Hardware: NVIDIA Orin Nano 4GB
  • Object Detection Model: YOLOv8
  • Jetpack: 5.1.1

Could you please help me troubleshoot this issue? What could be the possible reasons for the slower inference speed on the more powerful Orin Nano? Are there any specific optimizations or adjustments I should consider to improve the performance?

Thank you in advance for your time and assistance. I’m looking forward to your suggestions.


Which frameworks are you using for inference? Is it TensorRT?

We would like to reproduce this issue in our environment.
Could you share the detailed steps/source with us?


1 Like

Thank you for your reply,

I didn’t use TensorRT,
After installing Jetpack 5.1.1 ( SD Card Image Method), and first boot and setup, I follow the steps from this page Quickstart - Ultralytics YOLOv8 Docs to install YOLOV8 by cloning the git repository.
(With the Jetson Nano and Jetpack 4.6 I used to create a virtual env with python 3.8 to install YOLOV8).
After the installation is finished I reboot and then test the predict task. It works well and detects well, but is very slow compared to the Jetson Nano inference (~300ms for the Orin Nano compared to 170 for the Jetson Nano)

@najwa.belarbi Did you find the reason for this?

no… still nothing :(

I think you are trying compare both on cpu not cuda…you should either install the gpu torch or onnxruntime, and of course the best usage is on TensorRT.

We’ve been seeing similar results on GPU, PyTorch. @najwa.belarbi You use PyTorch GPU too right? Did you try TensorRT? Does the trend reverse?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.


Could you share the detailed steps to reproduce this issue?
Is it possible to reproduce with a CUDA code?