Improving the speed for fp32 for yolov10x inference from Ultralytics on Jetson AGX Orin 64g devkit

mona.jalal · September 13, 2024, 7:40pm

I have connected two RealSense D435 cameras to the Jetson AGX Orin 64g devkit (cameras show independent output and only detect the object).

I was under the assumption that using int8 optimization for trt engine will yield a faster optimization.

However, it is not like so.

currently, this is the speed I get for fp32 engine which is also variable and not same??:

0: 640x640 1 person, 48.8ms
Speed: 10.1ms preprocess, 48.8ms inference, 20.4ms postprocess per image at shape (1, 3, 640, 640)
 
0: 640x640 1 person, 56.4ms
Speed: 15.4ms preprocess, 56.4ms inference, 9.2ms postprocess per image at shape (1, 3, 640, 640)
255

If I use int8, it is not that much faster and also starts not being precise (as expected), doesn’t recognize a person when there is a person (despite 0.5 confidence threshold)

0: 640x640 (no detections), 33.9ms
Speed: 7.0ms preprocess, 33.9ms inference, 2.4ms postprocess per image at shape (1, 3, 640, 640)
 
0: 640x640 2 persons, 23.2ms
Speed: 10.3ms preprocess, 23.2ms inference, 9.5ms postprocess per image at shape (1, 3, 640, 640)
255

For each camera, I am using their dedicated model which is technically same file I have made a copy of:

model1 = YOLO("yolov10x_cam1_fp32.engine", task="detect")
model2 = YOLO("yolov10x_cam2_fp32.engine", task="detect")

Please note the RGB camera is real-time at 30 FPS. My goal is to have a (near) realtime feel for both cameras when they are connected. Currently, it feels a bit laggy.

Some further info:

$ pip show ultralytics
Name: ultralytics
Version: 8.2.90
Summary: Ultralytics YOLOv8 for SOTA object detection, multi-object tracking, instance segmentation, pose estimation and image classification.
Home-page: 
Author: Glenn Jocher, Ayush Chaurasia, Jing Qiu
Author-email: 
License: AGPL-3.0
Location: /home/mona/.local/lib/python3.10/site-packages
Requires: matplotlib, numpy, opencv-python, pandas, pillow, psutil, py-cpuinfo, pyyaml, requests, scipy, seaborn, torch, torchvision, tqdm, ultralytics-thop
Required-by: 

$ python
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux

$ uname -a
Linux ubuntu 5.15.136-tegra #1 SMP PREEMPT Mon May 6 09:56:39 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux
mona@ubuntu:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy
mona@ubuntu:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:08:11_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


![367402750-78142adc-f703-4eb2-9754-ac323f359e66|690x332](upload://tLmrMU5otTvWF8CTHr07MGUxtyE.png)

AastaLLL · September 16, 2024, 6:07am

Hi,

Have you tried this with TensorRT?
More, could you run the tegrastats at the same time and share the output with us?

Thanks.

mona.jalal · September 16, 2024, 1:18pm

Yes I converted the model to an engine using fp32 (if you don’t use int8, the default is fp32) via Ultralytics API in Python.

I also used the Jetson clock command.

mona.jalal · September 16, 2024, 1:19pm

Can you explain further on this?

Also, when I run jetson clock, I keep getting overcurrent and trottled warning but the temperature is not too high.

AastaLLL · September 18, 2024, 7:23am

Hi,

Please run the following command and share the output with us.

$ sudo tegrastats

The command will share the CPU/GPU utilization so we can know if hardware resources are saturated or not.

Thanks.

Topic		Replies	Views
Slow inference with yolov8 pytorch on agx orin Jetson AGX Orin yolo	7	2392	September 5, 2023
Inference slow even using TensorRT Jetson AGX Orin tensorrt	15	2446	November 6, 2023
How can i get 1000 FPS by running the inference with TensorRT Tiny-YOLOv3 (Jetson AGX Xavier) Jetson AGX Xavier tensorrt , yolo , onnx , fps	6	3102	October 18, 2021
Anyway to boost yolo performance on Jetson Orin? Jetson Orin Nano yolo	17	1701	December 31, 2024
Benchmarck int8 similar to fp32 on yolov8 from ultralytics Jetson Orin Nano tensorrt , yolo	6	1690	December 18, 2023
Jetson agx orin executes yolov5s bug Jetson AGX Orin yolo	4	570	July 31, 2023
FPS of yolov3 on xavier nx Jetson AGX Xavier yolo	5	1872	April 29, 2022
Inference yolov8s custom train model (getting low fps) cuOpt tensorrt , camera , opencv , cuda , ubuntu , jetson-inference , python	1	983	April 27, 2023
NanoOWL on Jetson AGX Orin 32GB Issue Jetson AGX Orin jetson-inference , generative_ai	5	794	June 30, 2024
TensorRT 8.6 Performance Issue in AGX Orin 32Gb Jetson AGX Orin tensorrt	9	641	February 27, 2024

Improving the speed for fp32 for yolov10x inference from Ultralytics on Jetson AGX Orin 64g devkit

Related topics