Crop the buffer output from nvstreammux then input these cropped buffer into nvinfer for inference - to detect small people objects

I need to do inference on traffic videos, which often contain extremely small people objects. To detect vehicle and people objects in such small sizes, I have tried two methods as below:
1: the 1st one is to add a videoconvert plugin before streammux (right after source bin) in the pipeline:
On OSD display, the whole frame is cropped and inference only run on the cropped frame as well, which is good.
However, what I really want is on the OSD display it can be combination of the inference results from both original buffer and the cropped buffer. May I know how to implement this?

  1. the second one is to do NvBufSurfTransform in src probe function of streammux. I tried to crop the buffer output from streammux. The surface transform I have done is similar to the implementation in gst_dsexample_transform_ip() and get_converted_mat() in dsexample. But I found there is no difference in the inference result displayed via the OSD and seems nothing have been internally cropped. Seems the dst_surface has not even been output to downstream.

The components after streammux get input buffers from streammux.

SGIE has CROP pre-process, but PGIE does not support. We will try to support “crop” for PGIE. Source code is open. You can also try to modify it.