Hi everyone,
I created deepstream-sahi, a small project that integrates NVIDIA DeepStream with SAHI (Slicing Aided Hyper Inference) to improve detection—especially small objects—by running inference on sliced tiles.
Repo: https://github.com/hasantahabagci/deepstream-sahi
Key points
-
ROS2 package with a single node (deepstream_node)
-
Two modes: base (standard DeepStream) and sahi (DeepStream + SAHI slicing)
-
Easy configuration for model/labels + launch support
Looking for feedback
-
Best practices to reduce the latency overhead of slicing
-
Recommendations for DeepStream/TensorRT settings when running tiled inference
-
Any known patterns for batching/tiling optimizations in DeepStream
Thanks!
2 Likes
Hi @hasantahabagci,
Great project!
The current implementation has two areas that could be improved with dedicated GStreamer plugins, which would make the project significantly more robust and easier to use:
1. nvsahipreprocess — dynamic slice preprocessor
Right now the slicing is handled by nvdspreprocess with a static config file where ROIs are hardcoded for a specific resolution. This means every time you change resolution or tuning parameters, you have to manually regenerate config_preprocess_sahi.txt.
Since nvdspreprocess is open source (available in the DeepStream SDK sources under sources/gst-plugins/gst-nvdspreprocess/), it’s feasible to build a nvsahipre plugin on top of it that:
- Computes ROIs dynamically at runtime based on the incoming frame resolution
- Exposes slice parameters as GStreamer properties (
slice-width, slice-height, overlap-h, overlap-v) — no static config file needed
- Automatically sets the correct
network-input-shape for nvinfer
2. nvsahipostprocess — C++ NMM post-processor
The current merge logic runs as a Python probe on the nvinfer src pad, will bottleneck at scale with multiple sources.
The right place for the merge is a dedicated GstBaseTransform plugin inserted between nvinfer and nvtracker:
nvinfer → queue → nvsahipostprocess→ nvtracker → nvdsosd
- Direct C++ access to
NvDsBatchMeta / NvDsFrameMeta / NvDsObjectMeta via nvdsmeta.h is significantly faster than Python probe iteration
- The tracker receives clean merged detections with no duplicates, which also improves tracking stability
The plugin would run the same greedy NMM logic you already have in pipeline_common.py, removing suppressed objects via nvds_remove_obj_meta_from_frame() and updating rect_params on survivors — just in C++ with properties for metric (ios/iou) and threshold.
Together these two plugins would turn the project from “works with manual config” into a proper plug-and-play SAHI pipeline for DeepStream.
These are just suggestions based on my experience with DeepStream pipelines — happy to hear if others have different approaches. If I get the chance to prototype either of these I’ll share it here.
Thanks for sharing the project!
1 Like
Hi @Levi_Pereira,
Thanks a lot for the detailed feedback, really appreciate the suggestions.
You’re absolutely right about both points. The current implementation using nvdspreprocess with static ROIs is mainly a prototype approach and not very flexible. A dedicated nvsahipreprocess plugin that computes slices dynamically and exposes parameters as GStreamer properties would definitely make the pipeline cleaner.
Good point as well about the Python probe. Moving the NMM merge logic into a C++ GstBaseTransform plugin between nvinfer and nvtracker would be the proper DeepStream-native solution and should also remove the Python overhead.
I’ll definitely look into these ideas, especially the dynamic slicing plugin.
Thanks again for the valuable input!