NVDCF Tracker + Re-ID Performance Profile

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) A5000
• DeepStream Version 6.3
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.5
• NVIDIA GPU Driver Version (valid for GPU only) 525
• Issue Type( questions, new requirements, bugs) questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

  1. follow the setup and run example deepstream_tao_apps/apps/tao_others/README.md at release/tao4.0_ds6.3ga · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub
  2. tweaked the example and end up with using the following
application:
  enable-perf-measurement: 1
  perf-measurement-interval-sec: 5

tiled-display:
  enable: 0
  rows: 1
  columns: 1
  width: 1024
  height: 640
  gpu-id: 0
  #(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
  #(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
  #(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
  #(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
  #(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
  nvbuf-memory-type: 0

# This is where you define your input video
source:
 csv-file-path: deepstream_tao_apps/configs/app/sources.csv

sink0:
  enable: 0
  #Type - 1=FakeSink 2=EglSink 3=File
  type: 1
  sync: 1
  source-id: 0
  gpu-id: 0
  nvbuf-memory-type: 0

sink1:
  enable: 1
  #Type - 1=FakeSink 2=EglSink 3=File 4=UDPSink 5=nvdrmvideosink 6=MsgConvBroker
  type: 3
  #1=mp4 2=mkv
  container: 1
  #1=h264 2=h265 3=mpeg4
  ## only SW mpeg4 is supported right now.
  codec: 1
  sync: 0
  bitrate: 2000000
  output-file: out.mp4
  source-id: 0

osd:
  enable: 1
  gpu-id: 0
  border-width: 3
  text-size: 15
  text-color: 1;1;1;1
  text-bg-color: 0.3;0.3;0.3;1
  font: Arial
  show-clock: 0
  clock-x-offset: 800
  clock-y-offset: 820
  clock-text-size: 12
  clock-color: 1;0;0;0
  nvbuf-memory-type: 0
  display-text: 1


streammux:
  gpu-id: 0
  ##Boolean property to inform muxer that sources are live
  live-source: 0
  ## batch-size must be identical to the number of input sources.
  batch-size: 1
  ##time out in usec, to wait after the first buffer is available
  ##to push the batch even if the complete batch is not formed
  batched-push-timeout: 40000
  ## Set muxer output width and height
  width: 1024
  height: 640 
  enable-padding: 0
  nvbuf-memory-type: 0

primary-gie:
  enable: 1
  plugin-type: 0
  gie-unique-id: 1
  bbox-border-color0: 1;0;0;1
  bbox-border-color1: 0;1;1;1
  bbox-border-color2: 0;1;1;1
  bbox-border-color3: 0;1;0;1
  config-file: ${my-own-config,which I cannot share}

tracker:
  enable: 1
  gpu-id: 0
  tracker-width: 1024 
  tracker-height: 640 
  ll-lib-file: /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
  ll-config-file: config_tracker_NvDCF_accuracy.yml
  enable-batch-process: 1
  enable-past-frame: 1
  display-tracking-id: 1



tests:
  file-loop: 0

and for the tracker, I was using the config_tracker_NvDCF_accuracy.yml but only swap the Re-ID model to be ONNX one.

BaseConfig:
  minDetectorConfidence: 0.1894    # If the confidence of a detector bbox is lower than this, then it won't be considered for tracking

TargetManagement:
  enableBboxUnClipping: 1    # In case the bbox is likely to be clipped by image border, unclip bbox
  preserveStreamUpdateOrder: 0    # When assigning new target ids, preserve input streams' order to keep target ids in a deterministic order over multuple runs
  maxTargetsPerStream: 150    # Max number of targets to track per stream. Recommended to set >10. Note: this value should account for the targets being tracked in shadow mode as well. Max value depends on the GPU memory capacity

  # [Creation & Termination Policy]
  minIouDiff4NewTarget: 0.3686    # If the IOU between the newly detected object and any of the existing targets is higher than this threshold, this newly detected object will be discarded.
  minTrackerConfidence: 0.1513    # If the confidence of an object tracker is lower than this on the fly, then it will be tracked in shadow mode. Valid Range: [0.0, 1.0]
  probationAge: 2    # If the target's age exceeds this, the target will be considered to be valid.
  maxShadowTrackingAge: 42    # Max length of shadow tracking. If the shadowTrackingAge exceeds this limit, the tracker will be terminated.
  earlyTerminationAge: 1    # If the shadowTrackingAge reaches this threshold while in TENTATIVE period, the target will be terminated prematurely.

TrajectoryManagement:
  useUniqueID: 1    # Use 64-bit long Unique ID when assignining tracker ID. Default is [true]
  enableReAssoc: 1    # Enable Re-Assoc

  # [Re-Assoc Metric: Thresholds for valid candidates]
  minMatchingScore4Overall: 0.6622    # min matching score for overall
  minTrackletMatchingScore: 0.2940    # min tracklet similarity score for re-assoc
  minMatchingScore4ReidSimilarity: 0.0771    # min reid similarity score for re-assoc

  # [Re-Assoc Metric: Weights]
  matchingScoreWeight4TrackletSimilarity: 0.7981    # weight for tracklet similarity score
  matchingScoreWeight4ReidSimilarity: 0.3848    # weight for reid similarity score

  # [Re-Assoc: Motion-based]
  minTrajectoryLength4Projection: 34    # min trajectory length required to make projected trajectory
  prepLength4TrajectoryProjection: 58    # the length of the trajectory during which the state estimator is updated to make projections
  trajectoryProjectionLength: 33    # the length of the projected trajectory
  maxAngle4TrackletMatching: 67    # max angle difference for tracklet matching [degree]
  minSpeedSimilarity4TrackletMatching: 0.0574    # min speed similarity for tracklet matching
  minBboxSizeSimilarity4TrackletMatching: 0.1013    # min bbox size similarity for tracklet matching
  maxTrackletMatchingTimeSearchRange: 27    # the search space in time for max tracklet similarity
  trajectoryProjectionProcessNoiseScale: 0.0100    # trajectory projector's process noise scale w.r.t. state estimator
  trajectoryProjectionMeasurementNoiseScale: 100    # trajectory projector's measurement noise scale w.r.t. state estimator
  trackletSpacialSearchRegionScale: 0.0100    # the search region scale for peer tracklet

  # [Re-Assoc: Reid based. Reid model params are set in ReID section]
  reidExtractionInterval: 8    # frame interval to extract reid features per target

DataAssociator:
  dataAssociatorType: 0    # the type of data associator among { DEFAULT= 0 }
  associationMatcherType: 1    # the type of matching algorithm among { GREEDY=0, CASCADED=1 }
  checkClassMatch: 1    # If checked, only the same-class objects are associated with each other. Default: true

  # [Association Metric: Thresholds for valid candidates]
  minMatchingScore4Overall: 0.0222    # Min total score
  minMatchingScore4SizeSimilarity: 0.3552    # Min bbox size similarity score
  minMatchingScore4Iou: 0.0548   # Min IOU score
  minMatchingScore4VisualSimilarity: 0.5043    # Min visual similarity score

  # [Association Metric: Weights]
  matchingScoreWeight4VisualSimilarity: 0.3951    # Weight for the visual similarity (in terms of correlation response ratio)
  matchingScoreWeight4SizeSimilarity: 0.6003    # Weight for the Size-similarity score
  matchingScoreWeight4Iou: 0.4033    # Weight for the IOU score

  # [Association Metric: Tentative detections] only uses iou similarity for tentative detections
  tentativeDetectorConfidence: 0.1024    # If a detection's confidence is lower than this but higher than minDetectorConfidence, then it's considered as a tentative detection
  minMatchingScore4TentativeIou: 0.2852    # Min iou threshold to match targets and tentative detection

StateEstimator:
  stateEstimatorType: 1    # the type of state estimator among { DUMMY=0, SIMPLE=1, REGULAR=2 }

  # [Dynamics Modeling]
  processNoiseVar4Loc: 6810.8668    # Process noise variance for bbox center
  processNoiseVar4Size: 1541.8647    # Process noise variance for bbox size
  processNoiseVar4Vel: 1348.4874    # Process noise variance for velocity
  measurementNoiseVar4Detector: 100.0000    # Measurement noise variance for detector's detection
  measurementNoiseVar4Tracker: 293.3238    # Measurement noise variance for tracker's localization

VisualTracker:
  visualTrackerType: 1    # the type of visual tracker among { DUMMY=0, NvDCF=1 }

  # [NvDCF: Feature Extraction]
  useColorNames: 1    # Use ColorNames feature
  useHog: 1    # Use Histogram-of-Oriented-Gradient (HOG) feature
  featureImgSizeLevel: 3    # Size of a feature image. Valid range: {1, 2, 3, 4, 5}, from the smallest to the largest
  featureFocusOffsetFactor_y: -0.1054    # The offset for the center of hanning window relative to the feature height. The center of hanning window would move by (featureFocusOffsetFactor_y*featureMatSize.height) in vertical direction

  # [NvDCF: Correlation Filter]
  filterLr: 0.0767    # learning rate for DCF filter in exponential moving average. Valid Range: [0.0, 1.0]
  filterChannelWeightsLr: 0.0339    # learning rate for the channel weights among feature channels. Valid Range: [0.0, 1.0]
  gaussianSigma: 0.5687    # Standard deviation for Gaussian for desired response when creating DCF filter [pixels]


ReID:
  reidType: 2    # The type of reid among { DUMMY=0, NvDEEPSORT=1, Reid based reassoc=2, both NvDEEPSORT and reid based reassoc=3}

  # [Reid Network Info]
  batchSize: 100    # Batch size of reid network
  workspaceSize: 1000    # Workspace size to be used by reid engine, in MB
  reidFeatureSize: 512    # Size of reid feature
  reidHistorySize: 100    # Max number of reid features kept for one object
  inferDims: [3, 384, 128]    # Reid network input dimension CHW or HWC based on inputOrder
  networkMode: 1    # Reid network inference precision mode among {fp32=0, fp16=1, int8=2 }

  # [Input Preprocessing]
  inputOrder: 0    # Reid network input order among { NCHW=0, NHWC=1 }. Batch will be converted to the specified order before reid input.
  colorFormat: 0    # Reid network input color format among {RGB=0, BGR=1 }. Batch will be converted to the specified color before reid input.
  offsets: [109.1250, 102.6000, 91.3500] #[123.6750, 116.2800, 103.5300]    # Array of values to be subtracted from each input channel, with length equal to number of channels
  netScaleFactor: 0.01742919 #0.01735207    # Scaling factor for reid network input after substracting offsets
  keepAspc: 1    # Whether to keep aspc ratio when resizing input objects for reid

  # [Output Postprocessing]
  # addFeatureNormalization: 1 # If reid feature is not normalized in network, adding normalization on output so each reid feature has l2 norm equal to 1

  # [Paths and Names]
  onnxFile: "deepstream_tao_apps/models/reidentificationnet/ghost_reid.onnx" 
  modelEngineFile: "deepstream_tao_apps/models/reidentificationnet/ghost_reid.onnx_b100_gpu0_fp16.engine"

I believe you shall be able to reproduce what I got.

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I ran this pipeline to process a 1024 x 640 video on an A5000, which results in a 195 FPS, only 5-10 FPS drop compared with not using tracker.
However, based on your documentation: https://docs.nvidia.com/metropolis/deepstream/5.0/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_performance.html#wwpID0E0ND0HA, I expect to see your performance drop significantly.

I have 3 questions

  1. is this expected ?
  2. I stop seeing this kind of NVDCF performance profile after 5.0. Do you have any ? If so, could you share ? It will be very valuable to see the performance profile (FPS) on more recent GPUs, such as A100, A6000, with similar table as you have in DS5.0
  3. how do I know that the NVDCF is loaded instead of falling back to other trackers ? My biggest worry is that the config I specify was not loaded.

The NvDCF accuracy is using for best accuracy, so the performance of this profile is bad. Can you use config_tracker_NvDCF_max_perf.yml for performance test? Please upgrade to latest DeepStream version 6.4. Regarding nvtracker performance table, let me check and feedback later.

@kesong you may misunderstand my question here. What I meant is in return that I saw the performance profile is actually very good. And this is different from what I expected and read from the previous doc.

And yeah, if you can share some NvDCF (especially with Re-ID) tracker profiling (compared with no tracker) on more recent GPUs, like A6000, that will be very helpful.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Hello kyuan2023,

I notice that you are writing the output video stream into a file in the pipeline. Depending on the file I/O and encoder throughput, your perf of 195 fps may be bound by that, instead of the GPU. If so, it would imply that the GPU is not fully utilized, so the tracker can use it without degrading the perf much. I would recommend to check your GPU util with and without tracker. I would expect there would be a noticeable difference.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.