PERF issues with DeepStream 6.3, YOLOv8l in Jetson Orin Nano

wajid.ali · August 16, 2024, 7:40am

I am experiencing performance issue with yolov8l custom trained model using DeepStream 6.3. I am using this documentation Deploy YOLOv8 with TensorRT and DeepStream SDK | Seeed Studio Wiki for generating cfg, wts and label.txt files and performing inference on a sample video , video.mp4. I am getting just 13 FPS on average. How can I increase fps? I want it upto 25 to 30 fps.

SYSTEM
Jetson Nano
CUDA 11.4, V11.4.315
Ubuntu 20.04
Jetpack 5.1.2
PyTorch 2.1.0
Torchvision 0.16.1
TensorRT 8.5.2.2

OUTPUT

deepstream-app -c deepstream_app_config.txt
Deserialize yoloLayer plugin: yolo
0:00:04.379478892 206341 0xaaaac3867e30 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1988> [UID = 1]: deserialized trt engine from :/home/kodifly/DeepStream-Yolo/custom_model_9_classes/DeepStream-Yolo/model_b1_gpu0_fp32.engine
INFO: [Implicit Engine Info]: layers num: 5
0 INPUT kFLOAT data 3x640x640
1 OUTPUT kFLOAT num_detections 1
2 OUTPUT kFLOAT detection_boxes 8400x4
3 OUTPUT kFLOAT detection_scores 8400
4 OUTPUT kFLOAT detection_classes 8400

0:00:04.573165375 206341 0xaaaac3867e30 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2091> [UID = 1]: Use deserialized engine model: /home/kodifly/DeepStream-Yolo/custom_model_9_classes/DeepStream-Yolo/model_b1_gpu0_fp32.engine
0:00:04.603658079 206341 0xaaaac3867e30 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/kodifly/DeepStream-Yolo/custom_model_9_classes/DeepStream-Yolo/config_infer_primary.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

p: Pause
r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.

** INFO: <bus_callback:239>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:225>: Pipeline running

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
**PERF: 13.56 (13.46)
**PERF: 13.74 (13.65)
**PERF: 13.77 (13.63)
**PERF: 13.76 (13.67)
**PERF: 13.77 (13.70)
Quitting
nvstreammux: Successfully handled EOS for source_id=0
App run successful

File deepstream_app_config.txt

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=1
type=3
uri=file:///home/kodifly/DeepStream-Yolo/custom_model_9_classes/DeepStream-Yolo/video.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
type=2
sync=0
gpu-id=0
nvbuf-memory-type=0

[osd]
enable=1
gpu-id=0
border-width=5
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
live-source=0
batch-size=1
batched-push-timeout=40000
width=1920
height=1080
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt

[tests]
file-loop=0

File config_infer_primary.txt

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
custom-network-config=yolov8l.cfg
model-file=yolov8l.wts
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=9
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=0
symmetric-padding=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

user87838 · August 16, 2024, 9:54am

Instead of yolov8l, why don’t you try using a smaller model like yolov8m or yolov8s? Also you could convert the model to fp16 precision to gain more performance without affecting accuracy.

yuweiw · August 19, 2024, 1:34am

You can also check the loading with the command tegrastats. If the loading is too high, you can consider the method @user87838 attached and set the interval parameter to the nvinfer.

wajid.ali · August 19, 2024, 8:11am

Thank you for the suggestions , I was able to get 20 fps on recorded video using yolov8m and converting to fp16

wajid.ali · August 19, 2024, 8:37am

The above File deepstream_app_config.txt only runs the video file

[source0]
enable=1
type=3
uri=file:///home/kodifly/DeepStream-Yolo/custom_model_9_classes/DeepStream-Yolo/video.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0

and not able to access camera feed even though I tried with this

[source0]
enable=1
type=0 # Type 0 indicates V4L2 source
uri=file:///dev/video0 # or the appropriate video device node
num-sources=1
gpu-id=0
cudadec-memtype=0

I am using Hikrobot Machine Vision Camera and they have SDK for running the camera and accessing, I am not sure how to integrate this Hikrobot Machine Vision Camera with File deepstream_app_config.txt

the file I used to Run and grab camera frame from another directory just to run the camera

import sys
import threading
import os
import numpy as np
import cv2
import time
from ctypes import *
from ultralytics import YOLO

Update system path to include the current directory

script_dir = os.path.dirname(os.path.abspath(file))
sys.path.append(script_dir)

Add the MvImport directory to the system path
sys.path.append(os.path.join(script_dir, ‘MvImport’))

from MvCameraControl_class import *
from CameraParams_const import *

Global variables
g_bExit = False
frame = None

Desired resolution for processing
desired_width = 720
desired_height = 640

Load YOLOv8 model
model = YOLO(‘yolov8m.pt’)
print(“YOLOv8 model is loaded”)

def work_thread(cam):
global g_bExit
global frame

stOutFrame = MV_FRAME_OUT()
memset(byref(stOutFrame), 0, sizeof(stOutFrame))

# Load libc for Linux
libc = CDLL("libc.so.6")

while not g_bExit:
    ret = cam.MV_CC_GetImageBuffer(stOutFrame, 1000)
    if ret == 0:
        frame_info = stOutFrame.stFrameInfo
        frame_data = (c_ubyte * frame_info.nFrameLen)()
        libc.memcpy(byref(frame_data), stOutFrame.pBufAddr, frame_info.nFrameLen)
        
        # Convert the raw data to a NumPy array and reshape it to the correct dimensions
        raw_image = np.ctypeslib.as_array(frame_data).reshape((frame_info.nHeight, frame_info.nWidth, 3))
        frame = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB)

        cam.MV_CC_FreeImageBuffer(stOutFrame)
    else:
        print(f"No data [0x{ret:x}]")

def initialize_camera(default_index=0):
MvCamera.MV_CC_Initialize()

SDKVersion = MvCamera.MV_CC_GetSDKVersion()
print(f"SDK Version [0x{SDKVersion:x}]")

deviceList = MV_CC_DEVICE_INFO_LIST()
tlayerType = MV_GIGE_DEVICE | MV_USB_DEVICE

# Enum devices
ret = MvCamera.MV_CC_EnumDevices(tlayerType, deviceList)
if ret != 0:
    print(f"Enum devices fail! ret [0x{ret:x}]")
    sys.exit()

if deviceList.nDeviceNum == 0:
    print("Find no device!")
    sys.exit()

print(f"Find {deviceList.nDeviceNum} devices!")

# Print device information
for i in range(deviceList.nDeviceNum):
    mvcc_dev_info = cast(deviceList.pDeviceInfo[i], POINTER(MV_CC_DEVICE_INFO)).contents
    if mvcc_dev_info.nTLayerType == MV_GIGE_DEVICE:
        print(f"\nGIGE device: [{i}]")
        strModeName = "".join(chr(c) for c in mvcc_dev_info.SpecialInfo.stGigEInfo.chModelName)
        print(f"Device model name: {strModeName}")
        nip = [(mvcc_dev_info.SpecialInfo.stGigEInfo.nCurrentIp >> (8 * j)) & 0xff for j in range(4)]
        print(f"Current IP: {'.'.join(map(str, nip))}\n")
    elif mvcc_dev_info.nTLayerType == MV_USB_DEVICE:
        print(f"\nU3V device: [{i}]")
        strModeName = "".join(chr(c) for c in mvcc_dev_info.SpecialInfo.stUsb3VInfo.chModelName if c != 0)
        print(f"Device model name: {strModeName}")
        strSerialNumber = "".join(chr(c) for c in mvcc_dev_info.SpecialInfo.stUsb3VInfo.chSerialNumber if c != 0)
        print(f"User serial number: {strSerialNumber}")

# Use default device index for testing
nConnectionNum = default_index
if nConnectionNum >= deviceList.nDeviceNum:
    print("Input error!")
    sys.exit()

# Create Camera Object
cam = MvCamera()

# Select device and create handle
stDeviceList = cast(deviceList.pDeviceInfo[int(nConnectionNum)], POINTER(MV_CC_DEVICE_INFO)).contents

ret = cam.MV_CC_CreateHandle(stDeviceList)
if ret != 0:
    print(f"Create handle fail! ret [0x{ret:x}]")
    sys.exit()

# Open device
ret = cam.MV_CC_OpenDevice(MV_ACCESS_Exclusive, 0)
if ret != 0:
    print(f"Open device fail! ret [0x{ret:x}]")
    sys.exit()

if stDeviceList.nTLayerType == MV_GIGE_DEVICE:
    nPacketSize = cam.MV_CC_GetOptimalPacketSize()
    if int(nPacketSize) > 0:
        ret = cam.MV_CC_SetIntValue("GevSCPSPacketSize", nPacketSize)
        if ret != 0:
            print(f"Warning: Set Packet Size fail! ret [0x{ret:x}]")
    else:
        print(f"Warning: Get Packet Size fail! ret [0x{nPacketSize:x}]")

# Set trigger mode as off
ret = cam.MV_CC_SetEnumValue("TriggerMode", MV_TRIGGER_MODE_OFF)
if ret != 0:
    print(f"Set trigger mode fail! ret [0x{ret:x}]")
    sys.exit()

# Get payload size
stParam = MVCC_INTVALUE()
memset(byref(stParam), 0, sizeof(MVCC_INTVALUE))
ret = cam.MV_CC_GetIntValue("PayloadSize", stParam)
if ret != 0:
    print(f"Get payload size fail! ret [0x{ret:x}]")
    sys.exit()
nPayloadSize = stParam.nCurValue

# Start grabbing images
ret = cam.MV_CC_StartGrabbing()
if ret != 0:
    print(f"Start grabbing fail! ret [0x{ret:x}]")
    sys.exit()

return cam

def clean_up_camera(cam):
ret = cam.MV_CC_StopGrabbing()
if ret != 0:
print(f"Stop grabbing fail! ret [0x{ret:x}]")

ret = cam.MV_CC_CloseDevice()
if ret != 0:
    print(f"Close device fail! ret [0x{ret:x}]")

ret = cam.MV_CC_DestroyHandle()
if ret != 0:
    print(f"Destroy handle fail! ret [0x{ret:x}]")

MvCamera.MV_CC_Finalize()
cv2.destroyAllWindows()

def main():
global g_bExit
global frame

try:
    while True:
        # Initialize the camera
        cam = initialize_camera(default_index=0)  # Use default index for testing

        # Start the frame capture thread
        hThreadHandle = threading.Thread(target=work_thread, args=(cam,))
        hThreadHandle.start()

        prev_time = time.time()
        while True:
            if frame is not None:
                # Calculate FPS
                curr_time = time.time()
                fps = 1 / (curr_time - prev_time)
                prev_time = curr_time

                # Resize the frame
                frame_resized = cv2.resize(frame, (desired_width, desired_height))

                # Perform object detection on the resized frame
                results = model(frame_resized)

                # Draw bounding boxes and labels on the frame
                if len(results) > 0 and hasattr(results[0], 'boxes'):
                    for result in results:
                        boxes = result.boxes
                        if len(boxes) > 0:
                            for box in boxes:
                                # Get coordinates from the tensor
                                x1, y1, x2, y2 = box.xyxy[0].cpu().numpy().astype(int)
                                class_id = int(box.cls[0].cpu().numpy())
                                conf = box.conf[0].cpu().numpy()

                                # Draw bounding box
                                color = (0, 255, 0)  # Green
                                cv2.rectangle(frame_resized, (x1, y1), (x2, y2), color, 2)
                                
                                # Draw label
                                label = f"{model.names[class_id]} {conf:.2f}"
                                cv2.putText(frame_resized, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

                # Show frame with FPS
                cv2.putText(frame_resized, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
                cv2.imshow("Detection", frame_resized)

                key = cv2.waitKey(1) & 0xFF
                if key == ord('q'):
                    g_bExit = True
                    break

            else:
                print("No frame data available!")

        # Clean up
        clean_up_camera(cam)

except Exception as e:
    print(f"Exception occurred: {e}")
    if 'cam' in locals():
        clean_up_camera(cam)

if name == “main”:
main()

yuweiw · August 19, 2024, 9:05am

You can refer to our FAQ to set up your camera source.

yingliu · September 23, 2024, 8:14am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · October 7, 2024, 8:15am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Yolov8 using deepstream jetson nano DeepStream SDK jetson , deepstream	13	136	October 17, 2024
Unable to start Yolo8 deepstream with MJpeg AVI DeepStream SDK jetson-inference , gstreamer	14	878	September 8, 2023
PERF issues with DeepStream6.2 + YOLOv8 in Jetson Xavier DeepStream SDK jetson-inference , performance , yolo , fps , deepstream	8	999	September 26, 2023
Yolov5 on jetson nano segmentation fault DeepStream SDK camera , cuda , jetson-inference , gstreamer , jetson	9	172	July 23, 2024
Instructions to integrate TAO 3.0 YoloV4 model into DeepStream produce no output on Jetson NX DeepStream SDK	10	403	December 5, 2023
What kind of hardware rigs can support 100+ videos analytics using deepstream? DeepStream SDK hw	30	1834	October 12, 2021
Loss of precision to onnx converter for engine by deepstream 6.3 DeepStream SDK tensorrt , gstreamer , inference-server-triton	31	503	August 2, 2024
Deepstream Memory Leak and FPS drop on Jetson DeepStream SDK yolo , jetson , deepstream	3	103	October 9, 2024
Deepstrem 5.0 Python yolo DeepStream SDK	16	2689	October 12, 2021
Yolov8 model latency on jetson orin nx Jetson Orin NX yolo	17	137	May 21, 2025

PERF issues with DeepStream 6.3, YOLOv8l in Jetson Orin Nano

Related topics