Inference results different when scaling is executed on NvStreammux

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) T4 GPU
• DeepStream Version 7.1
• TensorRT Version 10.7
• NVIDIA GPU Driver Version (valid for GPU only) 535.183.01
• Issue Type( questions, new requirements, bugs) Bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

I am encountering an accuracy discrepancy when using a custom trained YOLOv6s model in a DeepStream pipeline. Specifically, the detection results differ when the scaling is performed by nvstreammux compared to when it is done during the preprocessing stage within nvinfer. When rescaling is performed by nvstreammux the detection drops are no longer observed, where as when performed by nvinfer I can see failed detections. _

Source - 1280x720 png images
Model - Yolov6s (640x640)

# Scaling with nvstreammux
gst-launch-1.0 multifilesrc start-index=1 location=dt_issue%d.png ! pngdec ! videorate ! "video/x-raw,framerate=10/1" ! videoconvert ! nvvideoconvert ! \
capsfilter caps="video/x-raw(memory:NVMM), format=RGBA" !  m.sink_0 nvstreammux enable-padding=1 name=m width=640 height=640 batch-size=1 !   \
nvvideoconvert ! capsfilter caps="video/x-raw(memory:NVMM), format=RGBA" !    \
nvinfer config-file-path=/yolo/deepstream_cfg/yolov6_model_config_nvinfer.txt !    \
nvvideoconvert ! nvdsosd ! nvmultistreamtiler width=640 height=640 ! nvvideoconvert  ! nvv4l2h264enc bitrate=400000 \
! h264parse ! mp4mux ! filesink location=output.mp4

# Scaling with nvinfer
gst-launch-1.0 multifilesrc start-index=1 location=dt_issue%d.png ! pngdec ! videorate ! "video/x-raw,framerate=10/1" ! videoconvert ! nvvideoconvert ! \
capsfilter caps="video/x-raw(memory:NVMM), format=RGBA" !  m.sink_0 nvstreammux enable-padding=1 name=m width=1280 height=720 batch-size=1 !   \
nvvideoconvert ! capsfilter caps="video/x-raw(memory:NVMM), format=RGBA" !    \
nvinfer config-file-path=/yolo/deepstream_cfg/yolov6_model_config_nvinfer.txt !    \
nvvideoconvert ! nvdsosd ! nvmultistreamtiler width=1280 height=720 ! nvvideoconvert  ! nvv4l2h264enc bitrate=400000 \
! h264parse ! mp4mux ! filesink location=output.mp4

# Configuration file (yolov6_model_config_nvinfer.txt)
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=/yolo/deepstream_cfg/yolov6s_vehicle_2024_3_fp32_default.pt.onnx
model-engine-file=/yolo/deepstream_cfg/model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=/yolo/deepstream_cfg/labels_yolov6_vehicle.txt
batch-size=1
network-mode=0 # 1 --> INT8 2--> FP16
num-detected-classes=1
interval=0
gie-unique-id=1
scaling-filter=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
workspace-size=2000
parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=/yolo/deepstream_cfg/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
#post-cluster-threshold=0.4
topk=300

Inference Result [Left scaling on nvstreammux @ 640p and right nvstreammux @720p]

References
Yolov6 -
Deepstream Postprocessing

Is the “keep aspect ratio” needed when scaling for training the model?

Yes, the models have already been deployed on our production environment, possibility of retraining is slim.

Why did you set “enable-padding=1” with nvstreammux since you already set keep aspect ratio with nvinfer preprocessing?

I believe it might be a typo error here, however since the source images are 1280x720 and the streammux is also 1280x720, it should be ineffective?

You can use “dump-input-tensor=1” setting in your yolov6_model_config_nvinfer.txt to dump the input tensor data to the model and compare the difference.