Using nvdspreprocess for improved small object detection

I have a use case for doing object detections on a 1080p camera, the objects that appear in this camera are small in size relative to the frame resolution.

I have a TAO yolov3 model with a good accuracy that I use for doing detections in other cameras where it gives good accuracy for objects at multiple scales and decided to use it for detections here, model was able to detect the objects in almost all cases but there were a quite number of False Positives.

To solve this I thought I would make use of the nvdspreprocess plugin available in deepstream to define a ROI of interest and only do detections there. I ran a couple of experiments and I will define the results here.

The model takes an input frame of 960 * 544, so the first experiment I ran was instead of resizing the whole 1080p frame, I selected a region in the frame and set it as the ROI of size 960 * 544 with nvdspreprocess. There was an overall drop in False Positives since we were not doing detections in the regions outside the ROI and in the ROI I could not find a decreased number of False Positives and the overall detection confidence for the objects decreased from > 0.95 to between 0.9 and 0.95.

The second experiment was resizing the ROI from 960 * 544 to 480 * 277, assuming since the ROI will be scaled up the objects gone for inference will be larger and hence the accuracy and number of False positives will decrease. I also thought if this worked we can collect frames with larger objects of larger size and use it for model training since smaller objects on frames is harder to come by in terms of a data collection perspective. This experiment gave me bad results with an increased number of FPs and the detected objects confidence going down from 0.9 to between 0.6 and 0.8.

I thought I could get a better understanding why the issue happens if I can look at the frames coming as output from the nvdspreprocess, I tried to add a probe at the nvdspreprocess’ src pad and also tried at the nvinfer’s sink pad but I was unable to access the frame in both the cases. I am trying out nvdspreprocess for the first time, let me know if my perspective on using it to solve my issue is wrong or if I am doing something wrong in trying to access the frame from nvdspreprocess output.

Apologies for the long post.

The plugin do not change the video frames. The nvpreprocess only output the tensor data in the metadata.

Please make sure you are familiar with GStreamer knowledge and coding skills before you debug with DeepStream.

The nvpreprocess is totally open source. /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvdspreprocess.

I understand that nvpreprocess outputs tensor data, I am asking if there is a way to extract the roi as a numpy array so that I can save it to disk. The idea behind adding the probes were to access the frame from the gst buffer.

I understand that this question is not a technical hurdle that I am facing with deepstream rather a deep learning problem that I am trying to solve using deepstream, my level of understanding of Gst and deepstream is working knowledge that is why I posted the question here on the forums hoping for support from experts such as yourself who have better understanding. Can you give me an answer as to whether using nvpreprocess will help me with better small object detections and if I will be able to save the ROI to disk for me to further debug.

If you want to get the data with scaling, normalization or format conversion processing, it is the tensor data. Please get the data from GstNvDsPreProcessBatchMeta user meta in the batch meta. NVIDIA DeepStream SDK API Reference: NvDsPreProcessTensorMeta Struct Reference | NVIDIA Docs

I need the image with the preprocessing done to it

If you want the data with all preprocessing such as scaling, format conversion and normalization is done, it is tensor data, you can get from the NvDsPreProcessTensorMeta user meta in the batch meta. NVIDIA DeepStream SDK API Reference: NvDsPreProcessTensorMeta Struct Reference | NVIDIA Docs

My application is python based and NvDsPreProcessTensorMeta is unavailable with python. Even if I access this on a cpp application, my interest is only the frame image from the preprocess, will I get this information from the NvDsPreProcessTensorMeta.

There is no frame image after scaling, format transferring and normalization processings. The data is NHWC, NCHW or NC for model input.

You can generate python binding by yourself. deepstream_python_apps/bindings/ at master · NVIDIA-AI-IOT/deepstream_python_apps (

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.