β’ Issue Type( questions, new requirements, bugs)
new requirements
Opening Discussion & Implementation Initiative
Hi DeepStream community and NVIDIA team! π
Iβm opening this discussion to propose a significant enhancement to DeepStream that I believe will be of great importance for the entire ecosystem. Iβm planning to start implementing this feature and would love to get support and guidance from both the community and NVIDIA to achieve this goal together.
This enhancement has the potential to dramatically improve performance and efficiency across many DeepStream use cases, and Iβm excited to collaborate with everyone to make it a reality.
Summary
Request for a new βSlice Processingβ feature in DeepStreamβs nvdspreprocess plugin as an evolution of the current ROI processing capabilities.
Background
Currently, the nvdspreprocess plugin supports ROI (Region of Interest) processing where each ROI is processed individually, generating separate tensors and requiring multiple inference calls. While this approach is valuable for many use cases, thereβs an opportunity to enhance performance and model efficiency through a complementary approach.
Proposed Feature: Slice Processing
Naming Consideration
Since βROIβ is already extensively used in DeepStream Analytics modules, I suggest using βSliceβ terminology for this new feature to avoid confusion and better represent its functionality.
Core Concept
The Slice Processing feature would allow users to:
-
Define a target frame resolution for the combined output
-
Select multiple slice regions from the original frame
-
Combine slices into a single mosaic frame at the target resolution
-
Perform single inference on the combined frame instead of multiple separate inferences
Technical Advantages
Performance Benefits
-
Reduced inference overhead: Single inference call instead of multiple calls
-
Better GPU utilization: Larger batch processing on single tensor
-
Lower memory fragmentation: Single large tensor vs multiple small tensors
Model Efficiency
-
Enhanced object visibility: Objects appear larger in the combined frame due to cropping and scaling
-
Use smaller/lighter models: Higher effective resolution allows for more efficient model architectures
-
Improved detection accuracy: Objects of interest get more pixel representation
Use Case Example
Current ROI Processing:
urrent ROI Processing:
Original Frame: 1920x1080
ROI 1: 400x400 β Individual inference β Tensor 1
ROI 2: 500x500 β Individual inference β Tensor 2
Total: 2 inference calls
Proposed Slice Processing:
Original Frame: 1920x1080
Slice 1: 400x400 β
Slice 2: 500x500 β Combined into 1280x720 mosaic β Single inference
Total: 1 inference call with higher effective resolution per object
Configuration Proposal
REST API Endpoint
POST /api/v1/slice/update
JSON Schema
json
{
"stream": {
"stream_id": "0",
"slice_mode": "mosaic",
"target_resolution": {
"width": 1280,
"height": 720
},
"slice_count": 2,
"slices": [
{
"slice_id": "0",
"left": 100,
"top": 300,
"width": 400,
"height": 400,
"position_in_mosaic": {
"x": 0,
"y": 0
}
},
{
"slice_id": "1",
"left": 550,
"top": 300,
"width": 500,
"height": 500,
"position_in_mosaic": {
"x": 640,
"y": 0
}
}
]
}
}
Configuration File Extension
ini
[property]
enable-slice-processing=1
slice-target-width=1280
slice-target-height=720
slice-mosaic-mode=1
[slice-group-0]
src-ids=0
slice-count=2
slice-0-params=100;300;400;400;0;0
slice-1-params=550;300;500;500;640;0
Implementation Considerations
Processing Pipeline
-
Extract slices from original frame based on coordinates
-
Scale/resize each slice to fit target mosaic layout
-
Combine slices into single frame buffer
-
Apply standard preprocessing (normalization, format conversion)
-
Generate single tensor for inference
-
Map inference results back to original frame coordinates
Memory Management
-
Efficient slice extraction using CUDA kernels
-
Optimized memory copying for mosaic combination
-
Support for different memory types (device, pinned, unified)
Backward Compatibility
-
Maintain existing ROI functionality unchanged
-
Add slice processing as additional feature
-
Allow per-stream configuration of ROI vs Slice mode
Benefits Summary
-
Performance: Reduced inference calls and better GPU utilization
-
Efficiency: Use lighter models with better object representation
-
Flexibility: Choose between ROI (multiple inferences) or Slice (single inference) based on use case
-
Scalability: Better resource management for multi-stream scenarios
-
Accuracy: Enhanced object detection through improved pixel density
Request for Community Feedback
This enhancement would significantly benefit applications requiring:
-
High-performance multi-region analysis
-
Resource-constrained deployments
-
Real-time processing of large resolution streams
-
Efficient object detection in specific areas of interest
Would love to hear thoughts from the community and NVIDIA DeepStream team on the feasibility and potential implementation of this feature.
Ref: