Nvidia jetson detectnet increasing latency

Running one of the hello ai world demos, and everything works well at first, but shortly the latency starts to grow to unbelievably high levels. For example, when the demo starts, I get the performance stats and some logging immediately when a person is detected in frame. A few minutes in, the time is about ten seconds before I see any output indicating a person was detected. That delay grows and grows until the program crashes or just hangs.

There’s no obvious reason for the delay (that I can find anyways.) Performance shows that the frame was processed very quickly:
[TRT] ------------------------------------------------
[TRT] Timing Report networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.06708ms CUDA 1.17891ms
[TRT] Network CPU 48.02983ms CUDA 37.19260ms
[TRT] Post-Process CPU 0.05495ms CUDA 0.05500ms
[TRT] Visualize CPU 0.22068ms CUDA 1.53182ms
[TRT] Total CPU 48.37255ms CUDA 39.95833ms
[TRT] ------------------------------------------------

And the Jetson is only using 1.6gb of 4gb ram. CPU core usage sits between 20 and 50% for each core. Tons of space on the sd card as well. iostat shows very low wait times.

Hi,

Could you test the sample with --headless mode first?
This will turn off the display and no extra GPU resources are required for rendering.

Since Nano has limited resources, the latency might come from the workload for output binding.
Thanks.

headless mode has the same effect.

I’ve tested the onboard camera and the growing latency doesn’t happen. It’s only with rtsp source.

Stumbled across both of these related threads, seems the people had the same issue, but neither proposed solution has helped. I was hopeful that delay=0 and drop-on-latency would help, but alas it did not.

As an update: I’ve reduced from 1080p/30fps stream to a lower resolution, and the latency is completely gone. This makes sense on the surface, but I’d like to understand a bit more how to watch for and mitigate this sort of behavior.

Is there any way to see/monitor how much of a sort of queue has built up? I’d like to dump the queue if it reaches a certain level of latency.

Also, when looking at the gpu and cpu usage, it never really maxed out, so I’m wondering why the latency was there if there were more resources available.

I’m glad to have a solution in the lower resolution feed, but I’m still a bit lost as far as really grasping why this was happening.

Lastly, any ideas as to expand the capability of the solution would be appreciated. This is only one camera and I’d like to monitor four in a similar way. It doesn’t seem like I have any headroom at all here performance-wise. Some ideas:
- any way to use two or more nanos to divide the load? hopefully a better than one nano per camera ratio can be had.
- would dumping every other frame be possible or advisable to increase performance?