Nvtracker NvDCF VPI execution control

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
jetson nano 4GB

• DeepStream Version

• JetPack Version (valid for Jetson only)

• TensorRT Version

• NVIDIA GPU Driver Version (valid for GPU only)

• Issue Type( questions, new requirements, bugs)

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Is there a way to control how the nvtracker NvDCF/VisualTracker (ColorNames and/or HOG) is being scheduled ?

my observations:

The nvtracker NvDCF can be configured to use feature extraction (ColorNames and/or HOG). These feature extraction leverages the VPI library which in turn perform cuda stream map/sync, etc, …
So far I’ve not found a way/mechanism to control when the gst-nvtracker underlying library (highly multithreaded) should start triggering these calls.
From time to time, these calls do get scheduled during other primary and/or secondary inferences.
Thus driving longer inferences time, reducing determinism, and from to time to time, exceeding the frame interval I’m allowing to perform the required tasks… thus creating a cascade effect of pending inferences.

I know that there is enough time available after or before my PGIE/SGIE inferences (used nsight system to check that). If only I could tell/instruct the nvtracker to perform it’s job outside PGIE/SGIE inferences it would work…

So the question is: Is there such a mechanism available (API/configuration/etc…) ?

Ideal scheduling (nvtracker does not overlap PGIE or SGIE inference)

Problematic scheduling (nvtracker overlaps PGIE or SGIE inference)

Whenever the nvtacker VPI calls do occur while an inference is on-going, the inference execution time gets sensibly impacted (cf first 3 SGIE inferences on bottom graph).

Note: I have no problem adjusting cpp code (gstnvinfer.cpp, nvdsinfer_context_impl.cpp, or other), if necessary.

Suppose we need parallel process to maximum the performance. What application are you run? Do you see any performance when run your application?

Application is multi-stream real-time facial recognition:
Input stream(s) => Muxer => Face-Detection & landmarks (PGIE) => MOT Tracker => Alignment & Face-Recognition (SGIE) => [optional] Tiler => [optional] OSD => Output stream sink.

Performance Impact: whenever this behavior happens too much, pending inferences batches are scheduled too closely from each other (in time), and tracker has no choice but to overlap, which compounds the problem. At some point it does trigger a large number of frames to be skipped over (my application does mandate ‘sync=TRUE’ & ‘qos=TRUE’). I’m wondering if there is a way to tell the tracker to schedule VPI calls outside PGIE or SGIE inference so it does not trigger this problem (or any other potential approach to avoid this problem).

I’m already scheduling PGIE with interval=1 (every 66ms). Under ideal conditions, PGIE normally takes ~10ms (full frame / batch-size = number of input streams) and SGIE should take ~15/20ms (batch-size=16). This should leave enough time (at least 15ms) for the tracker to do its job without having to overlap PGIE/SGIE.

I also verified that if I use a pure-CPU based tracker (the KLT one from DS 5.1, ported on DS6.0.1 thanks to gst compatibility), I do not encounter the problem. Alas this ‘deprecated’ tracker is not good enough vs full/partial occlusions and generates too many tracks for the same person. So I need to stick with the NvDCF one. It’s really an awesome tracker ! If only I could avoid this overlap problem…

Thks in advance.

Screenshot of rtsp output from application running on jetson nano using 4 input streams:

There’s no explicit way of controlling the CUDA schedules among multiple modules/threads running concurrently in deepstream pipeline. Each module in DS and VPI uses its own CUDA stream, and not sure if allowing priority among them (through configurational params) would help. If that is the case, we may consider adding that support.

Hi pshin,
Thanks a lot for the timely reply.
Based on the nisght-system report, and my current understanding/hypothesis of what might be going on, it ‘seems’ that it would help to have such a priority control (at least, for my application need, VPI < PGIE/SGIE). If you ever come around implementing such a mechanism (on nano, or ‘nano next’ when it’ll be out), I would gladly integrate early versions and provide feedback versus my use case.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.