Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU): GPU
• DeepStream Version: 6.4
• JetPack Version (valid for Jetson only)
• TensorRT Version: 8.6.1.6-1+cuda12.0
• NVIDIA GPU Driver Version (valid for GPU only): Driver Version: 525.147.05
• Issue Type( questions, new requirements, bugs): bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
I
- Why we raise this issue?
Primary detector gives different bounding boxes compared to results gotten when directly calling a Triton Inference server with the samemodel.engine
file.
The bounding boxes are not entirely incorrect, but rather slightly modified (almost worsened), resembling errors in floating-point processing.
- Pipeline setup:
My pipeline:
uridecodebin -> nvstreammux -> queue -> nvinfer (primary detector)
- Element config:
- Input source is a video of size H x W = 640 x 640.
- Streammux
streammux.set_property("width", 640)
streammux.set_property("height", 640)
streammux.set_property("batch-size", num_sources)
streammux.set_property("batched-push-timeout", self.config["batched-push-timeout"])
streammux.set_property("enable-padding", 0)
streammux.set_property("interpolation-method", 4)
- nvinfer (primary detector)
property:
gpu-id: 0
net-scale-factor: 0.0039215697906911373
offsets: 0;0;0
model-color-format: 0
onnx-file: ../models/peoplenet_yolov8x/yolov8x.onnx
model-engine-file: ../models/peoplenet_yolov8x/yolov8x.onnx_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path: ../models/peoplenet_yolov8x/labels.txt
batch-size: 1
network-mode: 0
num-detected-classes: 80
interval: 0
gie-unique-id: 1
#operate-on-class-ids=0
filter-out-class-ids: 1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29;30;31;32;33;34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76;77;78;79
process-mode: 1
network-type: 0
cluster-mode: 2
maintain-aspect-ratio: 0
symmetric-padding: 0
#force-implicit-batch-dim=1
workspace-size: 1000
parse-bbox-func-name: NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path: ../custom_parser/libnvds_infercustomparser_tao.so
output-tensor-meta: 1
# engine-create-func-name: NvDsInferYoloCudaEngineGet
crop-objects-to-roi-boundary: 1
class-attrs-all:
pre-cluster-threshold: 0.21666836936549047
nms-iou-threshold: 0.5645207000469065
minBoxes: 2
dbscan-min-score: 0.693671458753017
eps: 0.15584185873130887
detected-min-w: 20
detected-min-h: 20
- Our effort/investigation?
- We resized the video to 640x640, as width and height of streammux.
- We try to use different values of
interpolation-method
but the difference is still happened. - We did a simple test, by replacing the
nvinfer
element tonvinferserver
with python backend, we are able to receive and dump input tensors before fetched to detector. We see that the input tensors are a bit different compared to original frames, which are extracted from the input video.
Image 1: Second channel of the 1st frame, logged before fetched to the model in nvinferserver
Image 2: Second channel of the 1st frame, extracted by using opencv-python
.
Video for testing:
videos.zip (5.8 MB)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)