PERF issues with DeepStream 6.3, YOLOv8l in Jetson Orin Nano

I am experiencing performance issue with yolov8l custom trained model using DeepStream 6.3. I am using this documentation Deploy YOLOv8 with TensorRT and DeepStream SDK | Seeed Studio Wiki for generating cfg, wts and label.txt files and performing inference on a sample video , video.mp4. I am getting just 13 FPS on average. How can I increase fps? I want it upto 25 to 30 fps.

SYSTEM
Jetson Nano
CUDA 11.4, V11.4.315
Ubuntu 20.04
Jetpack 5.1.2
PyTorch 2.1.0
Torchvision 0.16.1
TensorRT 8.5.2.2

OUTPUT

deepstream-app -c deepstream_app_config.txt
Deserialize yoloLayer plugin: yolo
0:00:04.379478892 206341 0xaaaac3867e30 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1988> [UID = 1]: deserialized trt engine from :/home/kodifly/DeepStream-Yolo/custom_model_9_classes/DeepStream-Yolo/model_b1_gpu0_fp32.engine
INFO: [Implicit Engine Info]: layers num: 5
0 INPUT kFLOAT data 3x640x640
1 OUTPUT kFLOAT num_detections 1
2 OUTPUT kFLOAT detection_boxes 8400x4
3 OUTPUT kFLOAT detection_scores 8400
4 OUTPUT kFLOAT detection_classes 8400

0:00:04.573165375 206341 0xaaaac3867e30 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2091> [UID = 1]: Use deserialized engine model: /home/kodifly/DeepStream-Yolo/custom_model_9_classes/DeepStream-Yolo/model_b1_gpu0_fp32.engine
0:00:04.603658079 206341 0xaaaac3867e30 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/kodifly/DeepStream-Yolo/custom_model_9_classes/DeepStream-Yolo/config_infer_primary.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

p: Pause
r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.

** INFO: <bus_callback:239>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:225>: Pipeline running

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
**PERF: 13.56 (13.46)
**PERF: 13.74 (13.65)
**PERF: 13.77 (13.63)
**PERF: 13.76 (13.67)
**PERF: 13.77 (13.70)
Quitting
nvstreammux: Successfully handled EOS for source_id=0
App run successful

File deepstream_app_config.txt

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=1
type=3
uri=file:///home/kodifly/DeepStream-Yolo/custom_model_9_classes/DeepStream-Yolo/video.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
type=2
sync=0
gpu-id=0
nvbuf-memory-type=0

[osd]
enable=1
gpu-id=0
border-width=5
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
live-source=0
batch-size=1
batched-push-timeout=40000
width=1920
height=1080
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt

[tests]
file-loop=0

File config_infer_primary.txt

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
custom-network-config=yolov8l.cfg
model-file=yolov8l.wts
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=9
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=0
symmetric-padding=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

Instead of yolov8l, why don’t you try using a smaller model like yolov8m or yolov8s? Also you could convert the model to fp16 precision to gain more performance without affecting accuracy.

You can also check the loading with the command tegrastats. If the loading is too high, you can consider the method @user87838 attached and set the interval parameter to the nvinfer.

Thank you for the suggestions , I was able to get 20 fps on recorded video using yolov8m and converting to fp16

The above File deepstream_app_config.txt only runs the video file

[source0]
enable=1
type=3
uri=file:///home/kodifly/DeepStream-Yolo/custom_model_9_classes/DeepStream-Yolo/video.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0

and not able to access camera feed even though I tried with this

[source0]
enable=1
type=0 # Type 0 indicates V4L2 source
uri=file:///dev/video0 # or the appropriate video device node
num-sources=1
gpu-id=0
cudadec-memtype=0

I am using Hikrobot Machine Vision Camera and they have SDK for running the camera and accessing, I am not sure how to integrate this Hikrobot Machine Vision Camera with File deepstream_app_config.txt

the file I used to Run and grab camera frame from another directory just to run the camera

import sys
import threading
import os
import numpy as np
import cv2
import time
from ctypes import *
from ultralytics import YOLO

Update system path to include the current directory

script_dir = os.path.dirname(os.path.abspath(file))
sys.path.append(script_dir)

Add the MvImport directory to the system path
sys.path.append(os.path.join(script_dir, ‘MvImport’))

from MvCameraControl_class import *
from CameraParams_const import *

Global variables
g_bExit = False
frame = None

Desired resolution for processing
desired_width = 720
desired_height = 640

Load YOLOv8 model
model = YOLO(‘yolov8m.pt’)
print(“YOLOv8 model is loaded”)

def work_thread(cam):
global g_bExit
global frame

stOutFrame = MV_FRAME_OUT()
memset(byref(stOutFrame), 0, sizeof(stOutFrame))

# Load libc for Linux
libc = CDLL("libc.so.6")

while not g_bExit:
    ret = cam.MV_CC_GetImageBuffer(stOutFrame, 1000)
    if ret == 0:
        frame_info = stOutFrame.stFrameInfo
        frame_data = (c_ubyte * frame_info.nFrameLen)()
        libc.memcpy(byref(frame_data), stOutFrame.pBufAddr, frame_info.nFrameLen)
        
        # Convert the raw data to a NumPy array and reshape it to the correct dimensions
        raw_image = np.ctypeslib.as_array(frame_data).reshape((frame_info.nHeight, frame_info.nWidth, 3))
        frame = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB)

        cam.MV_CC_FreeImageBuffer(stOutFrame)
    else:
        print(f"No data [0x{ret:x}]")

def initialize_camera(default_index=0):
MvCamera.MV_CC_Initialize()

SDKVersion = MvCamera.MV_CC_GetSDKVersion()
print(f"SDK Version [0x{SDKVersion:x}]")

deviceList = MV_CC_DEVICE_INFO_LIST()
tlayerType = MV_GIGE_DEVICE | MV_USB_DEVICE

# Enum devices
ret = MvCamera.MV_CC_EnumDevices(tlayerType, deviceList)
if ret != 0:
    print(f"Enum devices fail! ret [0x{ret:x}]")
    sys.exit()

if deviceList.nDeviceNum == 0:
    print("Find no device!")
    sys.exit()

print(f"Find {deviceList.nDeviceNum} devices!")

# Print device information
for i in range(deviceList.nDeviceNum):
    mvcc_dev_info = cast(deviceList.pDeviceInfo[i], POINTER(MV_CC_DEVICE_INFO)).contents
    if mvcc_dev_info.nTLayerType == MV_GIGE_DEVICE:
        print(f"\nGIGE device: [{i}]")
        strModeName = "".join(chr(c) for c in mvcc_dev_info.SpecialInfo.stGigEInfo.chModelName)
        print(f"Device model name: {strModeName}")
        nip = [(mvcc_dev_info.SpecialInfo.stGigEInfo.nCurrentIp >> (8 * j)) & 0xff for j in range(4)]
        print(f"Current IP: {'.'.join(map(str, nip))}\n")
    elif mvcc_dev_info.nTLayerType == MV_USB_DEVICE:
        print(f"\nU3V device: [{i}]")
        strModeName = "".join(chr(c) for c in mvcc_dev_info.SpecialInfo.stUsb3VInfo.chModelName if c != 0)
        print(f"Device model name: {strModeName}")
        strSerialNumber = "".join(chr(c) for c in mvcc_dev_info.SpecialInfo.stUsb3VInfo.chSerialNumber if c != 0)
        print(f"User serial number: {strSerialNumber}")

# Use default device index for testing
nConnectionNum = default_index
if nConnectionNum >= deviceList.nDeviceNum:
    print("Input error!")
    sys.exit()

# Create Camera Object
cam = MvCamera()

# Select device and create handle
stDeviceList = cast(deviceList.pDeviceInfo[int(nConnectionNum)], POINTER(MV_CC_DEVICE_INFO)).contents

ret = cam.MV_CC_CreateHandle(stDeviceList)
if ret != 0:
    print(f"Create handle fail! ret [0x{ret:x}]")
    sys.exit()

# Open device
ret = cam.MV_CC_OpenDevice(MV_ACCESS_Exclusive, 0)
if ret != 0:
    print(f"Open device fail! ret [0x{ret:x}]")
    sys.exit()

if stDeviceList.nTLayerType == MV_GIGE_DEVICE:
    nPacketSize = cam.MV_CC_GetOptimalPacketSize()
    if int(nPacketSize) > 0:
        ret = cam.MV_CC_SetIntValue("GevSCPSPacketSize", nPacketSize)
        if ret != 0:
            print(f"Warning: Set Packet Size fail! ret [0x{ret:x}]")
    else:
        print(f"Warning: Get Packet Size fail! ret [0x{nPacketSize:x}]")

# Set trigger mode as off
ret = cam.MV_CC_SetEnumValue("TriggerMode", MV_TRIGGER_MODE_OFF)
if ret != 0:
    print(f"Set trigger mode fail! ret [0x{ret:x}]")
    sys.exit()

# Get payload size
stParam = MVCC_INTVALUE()
memset(byref(stParam), 0, sizeof(MVCC_INTVALUE))
ret = cam.MV_CC_GetIntValue("PayloadSize", stParam)
if ret != 0:
    print(f"Get payload size fail! ret [0x{ret:x}]")
    sys.exit()
nPayloadSize = stParam.nCurValue

# Start grabbing images
ret = cam.MV_CC_StartGrabbing()
if ret != 0:
    print(f"Start grabbing fail! ret [0x{ret:x}]")
    sys.exit()

return cam

def clean_up_camera(cam):
ret = cam.MV_CC_StopGrabbing()
if ret != 0:
print(f"Stop grabbing fail! ret [0x{ret:x}]")

ret = cam.MV_CC_CloseDevice()
if ret != 0:
    print(f"Close device fail! ret [0x{ret:x}]")

ret = cam.MV_CC_DestroyHandle()
if ret != 0:
    print(f"Destroy handle fail! ret [0x{ret:x}]")

MvCamera.MV_CC_Finalize()
cv2.destroyAllWindows()

def main():
global g_bExit
global frame

try:
    while True:
        # Initialize the camera
        cam = initialize_camera(default_index=0)  # Use default index for testing

        # Start the frame capture thread
        hThreadHandle = threading.Thread(target=work_thread, args=(cam,))
        hThreadHandle.start()

        prev_time = time.time()
        while True:
            if frame is not None:
                # Calculate FPS
                curr_time = time.time()
                fps = 1 / (curr_time - prev_time)
                prev_time = curr_time

                # Resize the frame
                frame_resized = cv2.resize(frame, (desired_width, desired_height))

                # Perform object detection on the resized frame
                results = model(frame_resized)

                # Draw bounding boxes and labels on the frame
                if len(results) > 0 and hasattr(results[0], 'boxes'):
                    for result in results:
                        boxes = result.boxes
                        if len(boxes) > 0:
                            for box in boxes:
                                # Get coordinates from the tensor
                                x1, y1, x2, y2 = box.xyxy[0].cpu().numpy().astype(int)
                                class_id = int(box.cls[0].cpu().numpy())
                                conf = box.conf[0].cpu().numpy()

                                # Draw bounding box
                                color = (0, 255, 0)  # Green
                                cv2.rectangle(frame_resized, (x1, y1), (x2, y2), color, 2)
                                
                                # Draw label
                                label = f"{model.names[class_id]} {conf:.2f}"
                                cv2.putText(frame_resized, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

                # Show frame with FPS
                cv2.putText(frame_resized, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
                cv2.imshow("Detection", frame_resized)

                key = cv2.waitKey(1) & 0xFF
                if key == ord('q'):
                    g_bExit = True
                    break

            else:
                print("No frame data available!")

        # Clean up
        clean_up_camera(cam)

except Exception as e:
    print(f"Exception occurred: {e}")
    if 'cam' in locals():
        clean_up_camera(cam)

if name == “main”:
main()

You can refer to our FAQ to set up your camera source.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.