How to smooth the bbox detections with the DCF tracking in DS5.0 GA?

@jasonpgf2a unfortunately i cannot release the custom tracker, but i can refer you to this repo, which was a HUGE help for me to create my own. Scroll Down to Section 3 in the readme.

Of course, nothing in this repo is optimized or hardware accelerated, however you can utilize the jetson’s VPI API to accelerate the KLT portion.

Let me know if you have any further questions

Hello rsc44,

Our internal evaluations all indicate NvDCF is much better than KLT in terms of accuracy and robustness. If you got unsatisfactory results from NvDCF, we can help you out for that. Could you provide your feedback about NvDCF on what occasions NvDCF didn’t work well?

maybe the hardware acceleration is the key…

“you can utilize the jetson’s VPI API to accelerate the KLT portion”

@pshin Thank you for replying.

At first, i was a big-fan and supporter of nvdcf, but from my internal testing (Office, Groceries, Home, Street) NvDCF has three major issues. Issue #1 is why i had to take it out of my development cycle, its a good tracker dont get me wrong, but not the best base for things like people counting, face-rec, etc.

  1. Identity Switching ( Two people cross paths, 1 takes on the ID of the other, while the other gets a new ID). This is a problem with all trackers, yes, BUT NvDCF is more likely than others to swap the same tracking ID around to multiple people, most other trackers i’ve tested do not “switch”, they simply create a new incremented tracking ID. You see, for face recognition, i utilize trackers, so that i can perform face-rec on tracking IDs, this method allows me to conserve gpu resources. So if i perform face-rec on ID1, then the tracking ID1 switches to someone new, the system fails to do its job.

Thus its better to have a tracker, that is more likely to create new tracking IDs (klt) for the same person (can append the tracking ID paths together based on face-id), than one that is more likely to switch one tracking ID to multiple people (nvdcf) (no way to trust that one tracking id = one person, less appending of paths this is true, but there is no trivial method to check and see if/where/when nvdcf switched people, thus facial recognition has to run more often to perform sanity checks).

  1. Bbox Flutter. (Correlation Filters, can cause shrinking/growing BBox in real world environments, due to BG noise) (Again bad for tracking-based inference as the bbox is not in the correct position)

  2. Decreases Total # of Streams, and has a limit on # of objects it can track per stream. (Better to use the GPU for face, or ReId so you can match the tracklets together)

NvDCF is better than nvKLT, however my above suggestion does not say to use nvKLT, rather, i recommended a custom recipe of KLT + Kalman Filters(Sort) as a starting spot. In my tests, KLT + KF was a more useful tracker than NvDCF.

From my observations, Correlation Filters are much better than KLT to identify unique features of an ROI, but it seems to me that NvDCF has trouble differentiating the background from the Bbox ROI, it NEEDS to perform some form of background subtraction prior to the correlation filters so that it can get a better estimate of where the Human/Obj actually is inside the ROI. This would help tremendously with the ID switching.

If nvidia were to release the source code of NvDCF then maybe the community could collectively improve on them accelerated correlation filters.

Side Note: What is the internally evaluated MOTA of NvDCF, in my tests it ranges between 20-51% depending on the setting.

1 Like

Hello rsc44,

Sorry for the delay in response, and thank you so much for sharing your experience. Your feedback is really valuable and would like to help you to make best use of NvDCF tracker for your use.

Regarding #1 Identify Switching issue, sounds to me that in your current setting, you get false ID switches a lot more than what you desired. In your use-case, it would better to get robust tracking even if the length of tracklets gets shorter, because you can re-associate them using face-rec. For that use-case, I would make the data association criteria much stricter, so that only the highly similar bboxes are associated across frames. Below is a set of params that you can set as the minimum qualifications for various object aspects. Please try increasing the min threshold for these params. You can also start with a high values and then start decreasing to get a good balance for your use-case.

# [Data Association] Thresholds in matching scores to be considered as a valid candidate for matching
minMatchingScore4Overall: 0.0   # Min total score
minMatchingScore4SizeSimilarity: 0.5    # Min bbox size similarity score
minMatchingScore4Iou: 0.1       # Min IOU score
minMatchingScore4VisualSimilarity: 0.2    # Min visual similarity score
minTrackingConfidenceDuringInactive: 1.0  # Min tracking confidence during INACTIVE period. If tracking confidence is higher than this, then tracker will still output results until next detection 

I expect this tuning would mitigate the #2 Bbox Flutter issue. But, the bbox shrinking/growing completely depends on the bbox size from the detector, because NvDCF tracker doesn’t do scale adaptation on its own. It only estimates/predicts the bbox sizes based on the bbox inputs from the detector. If you want to make the scale adaptation slower, you can try lowering the following param (in case you use moving_avg estimator):

  trackExponentialSmoothingLr_scale: 0.3     # Learning rate for new scale

Or the following (in case you use Kalman filter)

  kfProcessNoiseVar4Scale: 0.04   # Process noise variance for scale in Kalman filter

You can also try using instance-awareness feature by setting the following param:

 useInstanceAwareness: 1

This would help the correlation filter to perform the discriminative learning against the nearby, same-class objects.

Please let us know how we can help you better and more.