Nvidia jetson detectnet increasing latency

Running one of the hello ai world demos, and everything works well at first, but shortly the latency starts to grow to unbelievably high levels. For example, when the demo starts, I get the performance stats and some logging immediately when a person is detected in frame. A few minutes in, the time is about ten seconds before I see any output indicating a person was detected. That delay grows and grows until the program crashes or just hangs.

There’s no obvious reason for the delay (that I can find anyways.) Performance shows that the frame was processed very quickly:
[TRT] ------------------------------------------------
[TRT] Timing Report networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.06708ms CUDA 1.17891ms
[TRT] Network CPU 48.02983ms CUDA 37.19260ms
[TRT] Post-Process CPU 0.05495ms CUDA 0.05500ms
[TRT] Visualize CPU 0.22068ms CUDA 1.53182ms
[TRT] Total CPU 48.37255ms CUDA 39.95833ms
[TRT] ------------------------------------------------

And the Jetson is only using 1.6gb of 4gb ram. CPU core usage sits between 20 and 50% for each core. Tons of space on the sd card as well. iostat shows very low wait times.

Hi,

Could you test the sample with --headless mode first?
This will turn off the display and no extra GPU resources are required for rendering.

Since Nano has limited resources, the latency might come from the workload for output binding.
Thanks.

headless mode has the same effect.

I’ve tested the onboard camera and the growing latency doesn’t happen. It’s only with rtsp source.

Stumbled across both of these related threads, seems the people had the same issue, but neither proposed solution has helped. I was hopeful that delay=0 and drop-on-latency would help, but alas it did not.

As an update: I’ve reduced from 1080p/30fps stream to a lower resolution, and the latency is completely gone. This makes sense on the surface, but I’d like to understand a bit more how to watch for and mitigate this sort of behavior.

Is there any way to see/monitor how much of a sort of queue has built up? I’d like to dump the queue if it reaches a certain level of latency.

Also, when looking at the gpu and cpu usage, it never really maxed out, so I’m wondering why the latency was there if there were more resources available.

I’m glad to have a solution in the lower resolution feed, but I’m still a bit lost as far as really grasping why this was happening.

Lastly, any ideas as to expand the capability of the solution would be appreciated. This is only one camera and I’d like to monitor four in a similar way. It doesn’t seem like I have any headroom at all here performance-wise. Some ideas:
- any way to use two or more nanos to divide the load? hopefully a better than one nano per camera ratio can be had.
- would dumping every other frame be possible or advisable to increase performance?

Hi,

Jetson-inference uses GStreamer as the backend camera interface.
So you can check if any configure can be used from the GStreamer side.

Another alternative is to try our Deepstream SDK.
The library is designed for a multimedia pipeline so you can control more about the RTSP source.
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_ref_app_deepstream.html#source-group

Thanks.

Thank you for the reply, but I think you may have missed my last update here; I don’t have an issue with controlling the RTSP stream, as I was able to find where the gstream configuration is set. I don’t want to just try a new SDK blindly, so much as understand the current limitation that I’m hitting (doesn’t appear to be CPU or GPU usage at a glance.)

I had posted a few questions in my last update, but primarily what I’m interested in is:

  1. what is happening? I get the feeling this may be a hardware limitation, so I’m ok with lowering the resolution as a fix. Just want to understand better.

  2. how can I see/monitor what is happening? Ideally, I’d like to watch for a growing backlog of frames to be analyzed and dump them if that occurs.

FYI - for updates on streaming latency/performance improvements in jetson-inference, please see this topic regarding the integration of NVMM memory:

There was extra CPU overhead in the jetson-inference GStreamer code from doing a memcpy() in the appsink element. I have recently integrated NVMM memory so it can go straight to GPU now. So far this is integrated/working with RTP/RTSP/video files captured through videoSource interface. I am working on adding the support for MIPI CSI and V4L2 cameras.

1 Like

Ooooh this seems promising thank you dusty_nv!

OK, update on this topic - NVMM support for CSI/V4L2 camera has been integrated into jetson-utils here:

https://github.com/dusty-nv/jetson-utils/commit/b38357bbe33640613acb7616fd7e675adbeaab2a