Object detection NMS results are different between nvinfer and nvinferserver

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU
• DeepStream Version
6.3
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
535.113.01
• Issue Type( questions, new requirements, bugs)
Bug/Question

I am running a Deepstream pipeline with YoloV5 object detection and NMS clustering. When comparing the outputs between using nvinfer and nvinferserver the returned results are different. Following this thread and including the patch the returned bounding boxes for detected objects from the NvDsInferParseYolo method are identical. The issue is with the NMS clustering algorithm in some way.

My config file for my nvinfer pipeline is:

property:
  gpu-id: 0
  net-scale-factor: 0.0039215697906911373 # 1/255
  # onnx-file: 
  model-engine-file: /home/DockerVolumeMount/models/tool_detection/vic_yolov5n_09_29_2023.trt
  labelfile-path: /home/DockerVolumeMount/exposure-gst-plugin/misc/tool-detection-labels.txt
  batch-size: 1
  process-mode: 1
  model-color-format: 0
  network-mode: 0
  network-type: 0 # 0 for object detection
  maintain-aspect-ratio: 1 # I think we want this for yolov5
  symmetric-padding: 1
  interval: 1 # How many batches to skip between inference calls
  gie-unique-id: 5 # 5 for tool detection model
  # output-tensor-meta: 1 # Attach raw tensor output to Gst Buffer metadata
  
  # object detection specific
  num-detected-classes: 3
  cluster-mode: 2

  # custom yolo implementation specific
  parse-bbox-func-name: NvDsInferParseYolo
  custom-lib-path: /home/DockerVolumeMount/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
  # engine-create-func-name: NvDsInferYoloCudaEngineGet # Can use for yolo TRT engine creation

class-attrs-all:
  nms-iou-threshold: 0.45
  pre-cluster-threshold: 0.25
  topk: 1

My nvinferserver config is:

infer_config {
  unique_id: 5
  gpu_ids: [0]
  max_batch_size: 1
  backend {
    triton {
      model_name: "tool_detection"
      version: -1
      grpc{
        url: "localhost:8001"
        enable_cuda_buffer_sharing: 1 # May not be able to do this on Jetson according to documentation, not sure - also it looks like it copies the buffers instead of just passing to the server, again not sure
      }
    }
  }

  preprocess {
    network_format: IMAGE_FORMAT_RGB
    tensor_order: TENSOR_ORDER_NONE
    # tensor_order: TENSOR_ORDER_LINEAR
    maintain_aspect_ratio: 1
    symmetric_padding: 1
    # tensor_name: "input"
    # frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
    # frame_scaling_filter: 1
    normalize {
      scale_factor: 0.0039215697906911373
      channel_offsets: [0.0,0.0,0.0]
    }
  }

  postprocess {
    labelfile_path: "/home/DockerVolumeMount/exposure-gst-plugin/misc/tool-detection-labels.txt"
    detection {
      num_detected_classes: 3
      custom_parse_bbox_func: "NvDsInferParseYolo"

      # per_class_params [
      #     { key: 0, value { pre_threshold : 0.25} }, 
      #     { key: 1, value { pre_threshold : 0.25} }, 
      #     { key: 2, value { pre_threshold : 0.25} }
      # ]

      nms {
        confidence_threshold: 0.25 # Tested and found same as per_class_params pre_threshold
        iou_threshold: 0.45
        topk: 1
      }
    }
  }
  # extra {
  #   copy_input_to_host_buffers: true
  # }
  custom_lib {
    path: "/home/DockerVolumeMount/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so"
  }
}

input_control {
  process_mode: PROCESS_MODE_FULL_FRAME
  interval: 1
}

I notice that the NMS topk parameter now no longer applies per class but applies to all detections. (Meaning for nvinferserver if it is set to 1 it only detects the highest probability for all detected objects, whereas for nvinfer if set to 1 it detects the highest probability for each individual class.) First question is whether this can be applied per class or not? Or whether meaning has changed in any other way.

Along with that, I notice that nvinferserver pipeline returns objects on frames that nvinfer pipeline does not, even though again the detected objects returned by the same post processing method NvDsInferParseCustomYolo are identical between the two. Second question is whether this is expected behavior or not?

I’m not sure whether this is a misunderstanding I have in what the parameters mean between nvinfer and nvinferserver configs, since they have changed, or whether it is a bug. I think it has to do with changes in the NMS clustering results, one of which is intentional regarding topk.

I am checking.

In nvinfer, you can set a topk for each class; in nvinferserver, only one topk setting is supported in one nvinferserver configuration file. please refer to opt\nvidia\deepstream\deepstream\samples\configs\deepstream-app-triton\config_infer_plan_engine_primary.txt

what do you mean about " I notice that nvinferserver pipeline returns objects on frames that nvinfer pipeline does not"? or how to reproduce this issue? Thanks!

Okay from my understanding then top_k behavior is different for nvinfer and nvinferserver, and there is no way around it, that’s fine.

On nvinferserver pipeline returning objects on frames that nvinfer does not what I mean is this: Detected object boxes are the same between the two (as returned by the shared library for custom post processing), but the final detected objects in NvDsObjectMeta are not. I believe this is due to a difference in the NMS clustering algorithm. I’ve checked the configuration parameter options and they appear to be the same, so I think it might be inside the NMS clustering code somewhere.

Unfortunately I don’t have a simple minimal working example to post here, so I was hoping that someone could check with the same object detection model (any model) using NMS clustering whether the results were the same between nvinfer and nvinferserver.

after using deepstream-app to source2_1080p_dec_infer-resnet_demux_int8.txt with nvinfer and nvinferserver, I can’t reproduce this issue. nvinfer and nvinferserver are opensource. the NMS functionality is in DetectPostprocessor::clusterAndFillDetectionOutputNMS.

Okay, thank you for looking into it @fanzh, I will explore more on my end to see how I might have things set up differently between the two plugins.

Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!

Apologies, I have been pulled into other directions and haven’t been able to take a look more at this. When I return to it sometime in the future I will see if the issue still exists, and I can post a new issue in that case, thanks.

OK, If need further support, please open a new one. Thanks!