Slow inference with yolov8 pytorch on agx orin

arne_luken · August 31, 2023, 8:55am

Hello everyone,
my inference time is to slow and i wonder if i make an obvious mistake? This is my setup:
Python version: 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0]
Python Path: /usr/bin/python3
OpenCV version: 4.5.4
OpenCV Path: /usr/local/lib/python3.8/site-packages/cv2/python-3.8
CUDA Version: 11.4
cuDNN Version: 8401
Ultralytics YOLOv8.0.140 🚀 Python-3.8.10 torch-1.13.0a0+d0d6b1f2.nv22.09 CUDA:0 (Orin, 30536MiB)
Setup complete ✅ (12 CPUs, 29.8 GB RAM, 35.0/56.7 GB disk)
YOLOv8 v0.1-121-g2fdc7f1 torch 1.13.0a0+d0d6b1f2.nv22.09 CUDA:0 (Orin, 30535.83203125MB)
Model summary: 225 layers, 3011043 parameters, 0 gradients

i am running a yolov8n model on a live feed from a realsense camera 640x480. the inference time is around 18 ms and the total detection process takes around 35 ms. When i have a look at benchmarks, these times should be 3-5 times lower. where do i have to look for to speed up the process. Is there a way to test my system?

This is a normal output:
Marker 1 Time for capturing frame: 4.37 ms
0: 480x640 1 door, 23.5ms
Speed: 2.0ms preprocess, 23.5ms inference, 2.8ms postprocess per image at shape (1, 3, 480, 640)
Marker 3 Time for object detection: 30.36 ms

This is my test code:

import pyrealsense2 as rs
import numpy as np
import cv2
from ultralytics import YOLO
import time
prev_time = 0
# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 60)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 60)
# Start streaming
pipeline.start(config)
# Align the depth frame to color frame
align_to = rs.stream.color
align = rs.align(align_to)

model = YOLO("/home/tec/yolov8/models/v8nbest.pt")

try:
    
while True:
        
start_time = cv2.getTickCount()
        marker_1_start = time.time()
        # Wait for a coherent pair of frames: depth and color
        frames = pipeline.wait_for_frames()
        aligned_frames = align.process(frames) # Align the frames
        depth_frame = aligned_frames.get_depth_frame()
        color_frame = aligned_frames.get_color_frame()
        if not depth_frame or not color_frame:
            continue
         
        # Convert images to numpy arrays
        color_image = np.asanyarray(color_frame.get_data())

        marker_1_end = time.time()
        print(f"    Marker 1 Time for capturing frame: {(marker_1_end - marker_1_start) * 1000:.2f} ms")

        # marker_3: Time taken for object detection in the frame
        marker_3_start = time.time()

        # Run YOLO inference
        result = model(color_image, verbose = True, show = False)
     
        marker_3_end = time.time()
        print(f"    Marker 3 Time for object detection: {(marker_3_end - marker_3_start) * 1000:.2f} ms")
     
        # Calculate FPS
        end_time = cv2.getTickCount()
        frame_time = (end_time - start_time) / cv2.getTickFrequency()
        fps = 1 / frame_time
        
        # Render FPS on the video frames
        cv2.putText(color_image, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

        # Show images in separate windows
        cv2.namedWindow('RGB Image', cv2.WINDOW_AUTOSIZE)
        cv2.imshow('RGB Image', color_image)

        # Break the loop if 'q' key is pressed
        if cv2.waitKey(1) == ord('q'):
            break

finally:
    # Stop streaming
    pipeline.stop()

    # Close all OpenCV windows
    cv2.destroyAllWindows()

Thanks

AastaLLL · September 1, 2023, 3:23am

Hi,

Have you maximized the device?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

oraday · September 1, 2023, 4:39am

I also have the same problem.

i have maximized the device as suggested with no difference in inference time.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

would appreciate insights.

AastaLLL · September 4, 2023, 5:31am

Hi,

Could you try it with TensorRT or Deepstream?
There are some samples written by the community to show how to do this.

For example GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models

Thanks.

arne_luken · September 4, 2023, 8:48am

Hi, thanks for the reply. My device is maximized. No change …

arne_luken · September 4, 2023, 8:57am

Hi,
i am aware that i have to use TensorRT to use the full potential of the GPU but i wonder why i am so far off from these results:

Thanks for the interesting repo, i will look into that .

AastaLLL · September 5, 2023, 5:33am

Hi,

Our results are based on TensorRT.
It’s recommended to give it a try.

Thanks.

system · September 27, 2023, 2:32am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow inference on images Jetson AGX Orin yolo	15	1068	November 6, 2023
Improve inference performances yolov5 Jetson Nano yolo , nano2gb	4	1558	June 2, 2022
YOLOv5 performance on Jetson Xavier AGX Jetson AGX Xavier camera , yolo , pytorch	2	2146	October 18, 2021
Want to halve inference time TensorRT	7	765	December 25, 2023
Running YOLO Over 8 Video Streams Simultaneously on Jetson Orin AGX - Seeking Advice Jetson AGX Orin yolo	4	346	September 16, 2024
Improving the speed for fp32 for yolov10x inference from Ultralytics on Jetson AGX Orin 64g devkit Jetson AGX Orin tensorrt , camera , yolo , python , tensorrt-model-optimizer	5	64	September 18, 2024
YOLOv3 TensorRT Inference Super Slow In Nano Jetson Nano	3	1073	October 14, 2021
Slow detection with Jetson Orin and Yolov7 Jetson AGX Orin yolo	4	1688	April 26, 2023
Yolov3 in nanojetson Jetson Nano tensorrt	12	1074	October 18, 2021
Yolov8 model latency on jetson orin nx Jetson Orin NX yolo	12	77	April 13, 2025

Slow inference with yolov8 pytorch on agx orin

Related topics