• Jetson Orin
• DeepStream 7.0
• JetPack Version 6.0
• FPS drops
• Run my pipeline
• custom pipeline with yolo
Hello,
I have a fps drop issue correlated with a memory leak on Deepstream C++ on my yolo pipeline
I have constant fps for 3/4 hours at ~ 30 fps then a drops to 20 unit the memory is too high and a reboot of the app append.
I am using this pipeline with yolov8s
pipeline:
- v4l2src:
device: /dev/video0
- capsfilter:
caps: "image/jpeg, width=1920, height=1080, framerate=30/1"
- jpegdec: {}
- videoconvert: {}
- nvvideoconvert: {}
- capsfilter:
caps: "video/x-raw(memory:NVMM), format=RGBA, width=1920, height=1080, framerate=30/1"
- mux.sink_0:
nvstreammux:
name: mux
batch-size: 1
width: 1920
height: 1080
batched-push-timeout: 4000000
live-source: 1
num-surfaces-per-frame: 1
sync-inputs: 0
max-latency: 0
- nvinfer:
name: primary-inference
config-file-path: ../infer_cfg/YOLOV8S.yml
- nvtracker:
tracker-width: 640
tracker-height: 384
gpu-id: 0
ll-lib-file: /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file: ../infer_cfg/config_tracker_NvDCF_perf.yml
- nvdsanalytics:
name: "analytics"
config-file: ../infer_cfg/analytics.txt
- nvvideoconvert: {}
- nvdsosd:
name: onscreendisplay
- fpsdisplaysink:
name: fps-display
video-sink: nveglglessink
text-overlay: false
sync: false
Here the parameter of the models :
property:
gpu-id: 0 # Ensure you're using the correct GPU ID
net-scale-factor: 0.0039215697906911373 # Normalize the input images (1/255)
model-color-format: 0 # 0 for RGBA, ensure this matches your model's expected format
onnx-file: ../../models/yolov8s/yolov8s.onnx # Path to the ONNX model file
int8-calib-file: calib.table # Uncomment if using INT8 precision; specify your calibration file
labelfile-path: ../../models/yolov8s/labels.txt # Path to the label file
model-engine-file: ../../models/yolov8s/YOLOV8S.engine # Path to the TensorRT engine file
batch-size: 1 # Consider increasing if you can afford the memory (e.g., 2 or 4)
network-mode: 2 # 0 for FP32, 1 for INT8, 2 for FP16 (adjust based on your accuracy/performance needs)
num-detected-classes: 80 # Total number of classes in your model
interval: 0 # Process every frame; adjust for performance vs. accuracy trade-offs
gie-unique-id: 1 # Unique ID for this GIE (can be incremented if multiple GIEs are used)
process-mode: 1 # 1 for synchronous, consider 0 for asynchronous if applicable
network-type: 0 # 0 for primary GIE (could use 1 for secondary)
cluster-mode: 2 # 2 for clustering based on NMS
maintain-aspect-ratio: 1 # Keep aspect ratio of input images
symmetric-padding: 1 # Use symmetric padding to avoid distortion
force-implicit-batch-dim: 1 # Set to 1 if your model uses implicit batch size (recommended for most use cases)
workspace-size: 2000 # Increase workspace size for larger models; tune based on GPU memory
parse-bbox-func-name: NvDsInferParseYoloCuda # Ensure this is the optimized function for YOLO
custom-lib-path: ../../models/custom_lib/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so # Path to custom library
engine-create-func-name: NvDsInferYoloCudaEngineGet # Function name for engine creation
filter-out-class-ids: 4;6;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29;30;31;32;33;34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76;77;78;79;80 # IDs to filter out, ensure these are correct based on your use case
class-attrs-all:
pre-cluster-threshold: 0.2 # Adjust based on your requirements
eps: 0.2 # Tune for NMS sensitivity
group-threshold: 1 # Keep as is for single detection; increase if grouping is needed
My only probe is fps callback probe
extern "C" {
GstPadProbeReturn logfps_probe_callback(GstPad* pad, GstPadProbeInfo* info, gpointer u_data) {
gchar *msg = NULL;
// Retrieve the last message property from the user data object
g_object_get(G_OBJECT(u_data), "last-message", &msg, NULL);
if (msg != NULL) {
// Print the FPS information
g_print("Fps info: %s\n", msg);
}
// Free the message string
g_free(msg);
return GST_PAD_PROBE_OK;
}
}
eventought the drop append outside and inside a docker image, here my docker compose param
vision:
build:
context: .
dockerfile: ./vision/Dockerfile
args:
CUDA_VER: 12.2
DEEPSTREAM_TAG: 7.0-samples-multiarch
container_name: vision
image: bedrock/vision
runtime: nvidia
network_mode: host
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
- ./data:/data
- ./vision/cfg:/app/cfg
devices:
- /dev/video0
environment:
- DISPLAY=${DISPLAY}
- TZ=Europe/Berlin
- CUDA_VER=12.2
working_dir: /app
command: ./build/vision ./cfg/pipeline_cfg/pipeline_main_jpeg_fps.yaml
restart: always
Here the performance graph using fps display and and jtop
I didn’t noticed any drop on x86 maybe because my memory is higher
Here the reference of our camera :
Thanks for any help !