Slow inference with yolov8 pytorch on agx orin

Hello everyone,
my inference time is to slow and i wonder if i make an obvious mistake? This is my setup:
Python version: 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0]
Python Path: /usr/bin/python3
OpenCV version: 4.5.4
OpenCV Path: /usr/local/lib/python3.8/site-packages/cv2/python-3.8
CUDA Version: 11.4
cuDNN Version: 8401
Ultralytics YOLOv8.0.140 πŸš€ Python-3.8.10 torch-1.13.0a0+d0d6b1f2.nv22.09 CUDA:0 (Orin, 30536MiB)
Setup complete βœ… (12 CPUs, 29.8 GB RAM, 35.0/56.7 GB disk)
YOLOv8 v0.1-121-g2fdc7f1 torch 1.13.0a0+d0d6b1f2.nv22.09 CUDA:0 (Orin, 30535.83203125MB)
Model summary: 225 layers, 3011043 parameters, 0 gradients

i am running a yolov8n model on a live feed from a realsense camera 640x480. the inference time is around 18 ms and the total detection process takes around 35 ms. When i have a look at benchmarks, these times should be 3-5 times lower. where do i have to look for to speed up the process. Is there a way to test my system?

This is a normal output:
Marker 1 Time for capturing frame: 4.37 ms
0: 480x640 1 door, 23.5ms
Speed: 2.0ms preprocess, 23.5ms inference, 2.8ms postprocess per image at shape (1, 3, 480, 640)
Marker 3 Time for object detection: 30.36 ms

This is my test code:

import pyrealsense2 as rs
import numpy as np
import cv2
from ultralytics import YOLO
import time
prev_time = 0
# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 60)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 60)
# Start streaming
pipeline.start(config)
# Align the depth frame to color frame
align_to = rs.stream.color
align = rs.align(align_to)

model = YOLO("/home/tec/yolov8/models/v8nbest.pt")

try:
    
while True:
        
start_time = cv2.getTickCount()
        marker_1_start = time.time()
        # Wait for a coherent pair of frames: depth and color
        frames = pipeline.wait_for_frames()
        aligned_frames = align.process(frames) # Align the frames
        depth_frame = aligned_frames.get_depth_frame()
        color_frame = aligned_frames.get_color_frame()
        if not depth_frame or not color_frame:
            continue
         
        # Convert images to numpy arrays
        color_image = np.asanyarray(color_frame.get_data())

        marker_1_end = time.time()
        print(f"    Marker 1 Time for capturing frame: {(marker_1_end - marker_1_start) * 1000:.2f} ms")

        # marker_3: Time taken for object detection in the frame
        marker_3_start = time.time()

        # Run YOLO inference
        result = model(color_image, verbose = True, show = False)
     
        marker_3_end = time.time()
        print(f"    Marker 3 Time for object detection: {(marker_3_end - marker_3_start) * 1000:.2f} ms")
     
        # Calculate FPS
        end_time = cv2.getTickCount()
        frame_time = (end_time - start_time) / cv2.getTickFrequency()
        fps = 1 / frame_time
        
        # Render FPS on the video frames
        cv2.putText(color_image, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

        # Show images in separate windows
        cv2.namedWindow('RGB Image', cv2.WINDOW_AUTOSIZE)
        cv2.imshow('RGB Image', color_image)

        # Break the loop if 'q' key is pressed
        if cv2.waitKey(1) == ord('q'):
            break

finally:
    # Stop streaming
    pipeline.stop()

    # Close all OpenCV windows
    cv2.destroyAllWindows()

Thanks

Hi,

Have you maximized the device?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

I also have the same problem.

i have maximized the device as suggested with no difference in inference time.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

would appreciate insights.

Hi,

Could you try it with TensorRT or Deepstream?
There are some samples written by the community to show how to do this.

For example GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models

Thanks.

Hi, thanks for the reply. My device is maximized. No change …

Hi,
i am aware that i have to use TensorRT to use the full potential of the GPU but i wonder why i am so far off from these results:


Thanks for the interesting repo, i will look into that .

Hi,

Our results are based on TensorRT.
It’s recommended to give it a try.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.