I am working on a system which does the following:
- Capture video frame
- Split frame into multiple images, with each representing an ROI
- Perform image classification on each image
- The aggregation of classifications on each of the images corresponding to a frame is the output for that frame
I went through the nvgstiva app, but since the above mentioned is an unconventional model I am finding it difficult to come with a design to port it to DeepStream. Can you please provide some suggestions for the deepstream pipeline.
<<How does opencv preprocess to seperating frame to multiple images? Do you have detection model or just classification model?>>
I have ROI co-ordinates in hand
For each ROI co-ordinate:
Create a mask image with all zeroes.(except the ROI) using cv2.fillpoly
masked_image = frame & mask
perform inference masked_image
Iam using Image classification model, AlexNet.
<< What’s platform you will use?>>
<<multiple sources or just one channel?>>
One channel (atleast for now)
I think it’s OK to port it to deepstream. You need to implement your own applicaton but don’t use nvgstiva app.
The pipeline is like this
opencv capture source + preprocess, masked_image ->
appsrc plugin ->
gst-nvinfer (sgie)plugin ->
gst-nvosd …-> get your metadata in application.
I guess the video source is raw data so do not need decoding.
For gst-nvinfer, set your own network/model properties.
If you don’t have much gstreamer experience, I suggest you use low level TensorRT directly. It’s easy for your case.
Thanks, I appreciate you taking time to respond. Couple of more questions please
- What will be the sink for the gstreamer pipeline?
- Also, will there be any performance difference between directly using Tensor-RT vs the mentioned gstreamer pipeline.