DCF Filter Learning doesn't seem to work as expected

Hello,

• Hardware Platform: Jetson Xavier NX
• DeepStream Version: 6.1.1
• JetPack Version: 5.0.2

I’m trying to tweak nvtracker according to the DeepStream Developer Guide, mainly filterLr, filterChannelWeightsLr and gaussianSigma.

My understanding is that these parameters control the rate/measure of a process which involves increasing the confidence over time: “If the visual appearance of the target objects is expected to vary quickly over time, one may employ a high learning rate for better adaptation of the correlation filter to the changing appearance”.

No matter how much I tweak these parameters, the confidence is getting lower when the object is changing “away” from the initial appearance, and higher when it’s changing “towards” the initial appearance (which makes sense). But I’m not seeing any improvement in confidence when an object is static in visual appearance, even over extended periods.

Any help will be much appreciated.

Can you share how you check it? Can you share the test video? So we can reproduce and have a check.

Hello @kesong,

Sure, I’ll arrange a test video. I’m simply starting a track on a specific object and add the confidence to OSD, let’s say the confidence starts at ~0.7, then I change the object attitude or rotate it a bit, the confidence is going down to around ~0.4, but when I hold it still (after that attitude change) the confidence never goes up.

In the meantime, can you please tell me if my assumption regarding DCF Filter Learning increasing confidence over time is right?

Thank you.

Hi @kesong, can you please verify my claim regarding DCF Filter Learning’s behavior?

It is yes in theory. But I need your video to check the details.

Hello @kesong!

Please see the attached video, it shows:

  • Confidence starts around 0.6 (missing 2 seconds at the beginning)
  • Confidence going down as the object is rotated
  • Confidence stays static around 0.27
  • Confidence going up as the object is rotated back to original attitude

20230323-201559.ts (20.8 MB)

The expected behavior when DCF Filter Learning is enabled is for the confidence to crawl up when object is static (as described here), but unfortunately it doesn’t.

Following is the config used for the video, also tried higher values for filterLr and filterChannelWeightsLr but got same results.

BaseConfig:
  minDetectorConfidence: 1.0   # If the confidence of a detector bbox is lower than this, then it won't be considered for tracking

TargetManagement:
  enableBboxUnClipping: 0   # In case the bbox is likely to be clipped by image border, unclip bbox
  preserveStreamUpdateOrder: 0 # When assigning new target ids, preserve input streams' order to keep target ids in a deterministic order over multuple runs
  maxTargetsPerStream: 1  # Max number of targets to track per stream. Recommended to set >10. Note: this value should account for the targets being tracked in shadow mode as well. Max value depends on the GPU memory capacity

  # [Creation & Termination Policy]
  minIouDiff4NewTarget: 0.5   # If the IOU between the newly detected object and any of the existing targets is higher than this threshold, this newly detected object will be discarded.
  minTrackerConfidence: 0.0   # If the confidence of an object tracker is lower than this on the fly, then it will be tracked in shadow mode. Valid Range: [0.0, 1.0]
  probationAge: 0             # If the target's age exceeds this, the target will be considered to be valid.
  maxShadowTrackingAge: 0    # Max length of shadow tracking. If the shadowTrackingAge exceeds this limit, the tracker will be terminated.
  earlyTerminationAge: 1   # If the shadowTrackingAge reaches this threshold while in TENTATIVE period, the target will be terminated prematurely.

TrajectoryManagement:
  useUniqueID: 0   # Use 64-bit long Unique ID when assignining tracker ID.
  enableReAssoc: 0    # Enable Re-Assoc

DataAssociator:
  dataAssociatorType: 0 # the type of data associator among { DEFAULT= 0 }
  associationMatcherType: 0 # the type of matching algorithm among { GREEDY=0, GLOBAL=1 }
  checkClassMatch: 0  # If checked, only the same-class objects are associated with each other. Default: true

  # [Association Metric: Thresholds for valid candidates]
  minMatchingScore4Overall: 0.0   # Min total score
  minMatchingScore4SizeSimilarity: 0.6  # Min bbox size similarity score
  minMatchingScore4Iou: 0.0             # Min IOU score
  minMatchingScore4VisualSimilarity: 0.7  # Min visual similarity score

  # [Association Metric: Weights]
  matchingScoreWeight4VisualSimilarity: 0.6  # Weight for the visual similarity (in terms of correlation response ratio)
  matchingScoreWeight4SizeSimilarity: 0.1    # Weight for the Size-similarity score
  matchingScoreWeight4Iou: 0.3   # Weight for the IOU score

StateEstimator:
  stateEstimatorType: 1  # the type of state estimator among { DUMMY=0, SIMPLE=1, REGULAR=2 }

  # [Dynamics Modeling]
  processNoiseVar4Loc: 3.0    # Process noise variance for bbox center
  processNoiseVar4Size: 1.0   # Process noise variance for bbox size
  processNoiseVar4Vel: 0.1    # Process noise variance for velocity
  measurementNoiseVar4Detector: 2.0    # Measurement noise variance for detector's detection
  measurementNoiseVar4Tracker: 16.0    # Measurement noise variance for tracker's localization

VisualTracker:
  visualTrackerType: 1 # the type of visual tracker among { DUMMY=0, NvDCF=1 }

  # [NvDCF: Feature Extraction]
  useColorNames: 0     # Use ColorNames feature
  useHog: 1            # Use Histogram-of-Oriented-Gradient (HOG) feature
  featureImgSizeLevel: 3  # Size of a feature image. Valid range: {1, 2, 3, 4, 5}, from the smallest to the largest
  featureFocusOffsetFactor_y: -0.2 # The offset for the center of hanning window relative to the feature height. The center of hanning window would move by (featureFocusOffsetFactor_y*featureMatSize.height) in vertical direction

  # [NvDCF: Correlation Filter]
  filterLr: 0.2 # learning rate for DCF filter in exponential moving average. Valid Range: [0.0, 1.0]
  filterChannelWeightsLr: 0.3 # learning rate for the channel weights among feature channels. Valid Range: [0.0, 1.0]
  gaussianSigma: 0.7 # Standard deviation for Gaussian for desired response when creating DCF filter [pixels]

Thank you in advance.

Suppose you should have one PGIE to detect the object. Can you share the whole pipeline and the detection model? The detection output BBox will also impact the confidence.

Hello @kesong,

The bbox size is 60 pixels over 60 pixels. The pipeline is very complex so can’t be shared, but the tracker is self isolated with it’s own config file which I included above.

The video clearly shows the correlation between the object attitude and the confidence, unfortunately there’s no “learning” process that affects the confidence over time. It looks like filterLr, filterChannelWeightsLr and gaussianSigma don’t make any difference to the confidence.

I’m on DeepStream 6.1.1 as mentioned, and I again refer to the documentation regarding “DCF Filter Learning”. Can you please investigate? Maybe something needs to be changed in the config?

Thank you!

EDIT: just to be clear, the video shows a single “track” bbox, there’s no PGIE sending new bboxes at intervals. This way it’s very easy to troubleshoot tracker issues.

Can you have a try with enable useColorNames and disable useHog? Suppose HOG will related with direction.

NvDCF tracker filter update happens only when there’s a match to PGIE detection. But, it is not happening per user’s comments. So, what the user is observing is an expected behavior.

To allow the filter update, please make sure there’s a detection bbox, so that data association is happening. Again, filter update is taking place only when there’s successful match.

Another thing is given your detected object is a rigid object (i.e., wheel), I would recommend setting featureFocusOffsetFactor_y: 0 instead of featureFocusOffsetFactor_y: -0.2.

Hello @pshin,

I’m trying to use nvtracker independently, without a PGIE. For reference, there VPI’s KLT Feature Tracker, but I couldn’t manage to get it to perform well while the object is gradually changing it’s visual/orientation.

So currently I’m trying to utilize nvtracker to track a one-time bbox and keep tracking even when the object is gradually changing it’s visual/orientation, it works well when the object is simply moving around, but when things starts changing confidence starts dropping until track loss.

Any thoughts?

Thank you.

NvDCF tracker filter won’t update in you use case as only a one-time bbox sent to tracker. So you will get dropping tracker confidence when you rotate the object. The is an expected behavior. Please send right bbox to tracker at some interval, so NvDCF tracker filter will be update to match the rotated object, then you can get higher tracker confidence.

So you will get dropping tracker confidence when you rotate the object

But on the other hand, there’re the DCF Filter Learning parameters which should cope with these gradual changes (rotation etc): “If the visual appearance of the target objects is expected to vary quickly over time, one may employ a high learning rate for better adaptation of the correlation filter to the changing appearance”.

As mentioned many times, NvDCF tracker filter won’t update in you use case as only a one-time bbox sent to tracker. It means the DCF Filter Learning parameters won’t take effect as the learning is stopped in your use case. Please send right bbox to tracker at some interval to start the DCF Filter Learning. Then you can fine tune the learning parameter.

Hello @kesong,

I’ve already tried VPI’s KLT Feature Tracker and nvtracker, but they stop tracking as soon as the object is changing it’s visual/attitude over time.

Is there something similar to this video that Nvidia offers as part of DeepStream, VPI, etc?

Thank you!

You can’t specify one object and track it in DeepStream.
Can you have a try with this sample: VPI - Vision Programming Interface: KLT Bounding Box Tracker?

Hello @kesong,

You can’t specify one object and track it in DeepStream.

But that’s the exact definition of nvtracker, supply a bbox and it will track it over time with lots of tuning and tweaking options. I’m not asking about the actual selection of the bbox (UI wise), but rather the tracking algorithm itself. Not sure I understand what you meant.

Can you have a try with this sample: VPI - Vision Programming Interface: KLT Bounding Box Tracker?

Yeah I have tried VPI KLT, even wrote in the previous reply that “they stop tracking as soon as the object is changing it’s visual/attitude over time.”

Tracking is part of DeepStream for better and for worse, the tracking capability shown in this video (NOT the UI for bbox selection which is the application responsibility) is a very basic use-case that’s worth considering for the next version, don’t you think?

Best regards.

Yes, nvtracker will track all object which output from PGIE detector.

@kesong, unfortunately couldn’t understand your reply in conjunction with my question, nvtracker doesn’t know and doesn’t care if the bbox came from a PGIE or if a user manually drew a box on a frame.

Anyway, I’ve built OpenCV 4.7 with CUDA support, is it possible to run one of the many trackers available in OpenCV on the GPU? Out-of-the-box CPU utilization is very high so GPU is a must if you need to leave any CPU for other tasks.

Thank you.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Please consult OpenCV expert for the tracker in OpenCV.

Thanks.