Hello everyone,
my inference time is to slow and i wonder if i make an obvious mistake? This is my setup:
Python version: 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0]
Python Path: /usr/bin/python3
OpenCV version: 4.5.4
OpenCV Path: /usr/local/lib/python3.8/site-packages/cv2/python-3.8
CUDA Version: 11.4
cuDNN Version: 8401
Ultralytics YOLOv8.0.140 π Python-3.8.10 torch-1.13.0a0+d0d6b1f2.nv22.09 CUDA:0 (Orin, 30536MiB)
Setup complete β
(12 CPUs, 29.8 GB RAM, 35.0/56.7 GB disk)
YOLOv8 v0.1-121-g2fdc7f1 torch 1.13.0a0+d0d6b1f2.nv22.09 CUDA:0 (Orin, 30535.83203125MB)
Model summary: 225 layers, 3011043 parameters, 0 gradients
i am running a yolov8n model on a live feed from a realsense camera 640x480. the inference time is around 18 ms and the total detection process takes around 35 ms. When i have a look at benchmarks, these times should be 3-5 times lower. where do i have to look for to speed up the process. Is there a way to test my system?
This is a normal output:
Marker 1 Time for capturing frame: 4.37 ms
0: 480x640 1 door, 23.5ms
Speed: 2.0ms preprocess, 23.5ms inference, 2.8ms postprocess per image at shape (1, 3, 480, 640)
Marker 3 Time for object detection: 30.36 ms
This is my test code:
import pyrealsense2 as rs
import numpy as np
import cv2
from ultralytics import YOLO
import time
prev_time = 0
# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 60)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 60)
# Start streaming
pipeline.start(config)
# Align the depth frame to color frame
align_to = rs.stream.color
align = rs.align(align_to)
model = YOLO("/home/tec/yolov8/models/v8nbest.pt")
try:
while True:
start_time = cv2.getTickCount()
marker_1_start = time.time()
# Wait for a coherent pair of frames: depth and color
frames = pipeline.wait_for_frames()
aligned_frames = align.process(frames) # Align the frames
depth_frame = aligned_frames.get_depth_frame()
color_frame = aligned_frames.get_color_frame()
if not depth_frame or not color_frame:
continue
# Convert images to numpy arrays
color_image = np.asanyarray(color_frame.get_data())
marker_1_end = time.time()
print(f" Marker 1 Time for capturing frame: {(marker_1_end - marker_1_start) * 1000:.2f} ms")
# marker_3: Time taken for object detection in the frame
marker_3_start = time.time()
# Run YOLO inference
result = model(color_image, verbose = True, show = False)
marker_3_end = time.time()
print(f" Marker 3 Time for object detection: {(marker_3_end - marker_3_start) * 1000:.2f} ms")
# Calculate FPS
end_time = cv2.getTickCount()
frame_time = (end_time - start_time) / cv2.getTickFrequency()
fps = 1 / frame_time
# Render FPS on the video frames
cv2.putText(color_image, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
# Show images in separate windows
cv2.namedWindow('RGB Image', cv2.WINDOW_AUTOSIZE)
cv2.imshow('RGB Image', color_image)
# Break the loop if 'q' key is pressed
if cv2.waitKey(1) == ord('q'):
break
finally:
# Stop streaming
pipeline.stop()
# Close all OpenCV windows
cv2.destroyAllWindows()
Thanks