High detection latency on Jetpack 5.1

tomislav · March 3, 2023, 11:00am

Hi,
we have upgraded some of our Jetsons to JetPack 5.1 from JetPack 4.6 and noticed substantial latency spikes when running our pytorch-based object detector.

Since we are unable to share our detector, we used Yolov7 for this post.
detector.zip (204.9 KB)

Base docker image on JetPack 4.6 was: nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3
and on Jetpack 5.1 we tried: nvcr.io/nvidia/l4t-ml:r35.2.1-py3, nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3, as well as building pytorch for 5.1.

We also tried Yolov5 and converting Yolov7 model to TensorRT .engine file, but all of these methods resulted in latency spikes. Please note that we would like to use pytorch for the time being, and not migrate to DeepStream.

Finally, here are the latency comparison results:

tomislav · March 3, 2023, 11:04am

Since I’m a new user, I could not add more than one link in the post so here they are:
links.txt (260 Bytes)

AastaLLL · March 6, 2023, 4:23am

Hi,

Have you maximized the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

More, please noted that the TensorRT engine needs to be regenerated on JetPack 5.1.
Thanks.

tomislav · March 6, 2023, 10:03am

Hi,
Thank you for the answer. We actually use nvpmodel 8 as it has 20W 6 core setup. Your suggestion turned off 4/6 CPU cores, but that did not influence the performance. We also have jetson_clocks inactive on Jetpack 4.6, and we don’t observe latency spikes there.

In the meantime, we have noticed that the detection spikes on JetPack 5.1 happen if the device has been idle for some time. In the attached image you can see the performance for the first run after the reboot, and for the second and third runs. As you can see, the first run converges to close to 40ms latency, while all the other consecutive runs have latency spikes.

We could not find how much time the device needs to be idle before the spikes happen, but if the first run after a reboot is done after more than one hour, we always observe latency spikes. Sometimes we will get N consecutive runs without spikes, but once they start happening, they never go away, even if we let the detector run for more than one hour. On the other hand, if the first run did not have spikes, and we let it run for an hour, spikes don’t show up. Note that sometimes we observe latency spikes on the first run after the reboot, but that happens less than 20% of the time and when it happens spikes are less frequent and less prominent (max 160 ms).

We tried to keep the docker container running, and restart the container, but it did not help. This proves that the detector is not the issue and that the device is capable of performing with acceptable latency, but for some reason spikes always manifest after some idle time.

You can test this with:

Comment line 15 import seaborn as sns in utils/plots.py
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt
docker run -it --rm --network host --runtime nvidia -v ${PWD}/detector/:/detector nvcr.io/nvidia/l4t-ml:r35.2.1-py3
cd /detector/ && pip install tqdm
python3 detect.py

AastaLLL · March 9, 2023, 7:34am

Hi,

Just want to confirm first.
The environment for Xavier NX outside of the container is also JetPack 5.1, is that correct?

Thanks.

tomislav · March 9, 2023, 11:59am

Yes, that is true. It would throw error when trying to run the code otherwise as we found out that r32 and r35 images can’t be run on Jetpack they aren’t designed for.

AastaLLL · March 10, 2023, 5:34am

Hi,

Could you also help to check if the latency is stable without using a container?

Thanks.

tomislav · March 10, 2023, 12:21pm

Hi,

At the moment we don’t want to install stuff on the device and would like to use official Nvidia containers.

Did you manage to reproduce the issue?

tomislav · March 14, 2023, 11:43am

I’d like to attach the minimal working example here with the instructions in the Readme file:
detector_latency_test.zip (1.1 MB)

AastaLLL · March 15, 2023, 7:25am

Hi,

Thanks for sharing the example.
Will get back to you later.

Thanks.

AastaLLL · March 15, 2023, 8:17am

Hi,

We tested the minimal working example on XavierNX with JetPack 5.1 but cannot reproduce the issue.

Repeat several times with nvpmodel 8 and jetson_clocks, the latency output is between 0.03~0.04, with no value > 0.05.

Thanks.

abdo.babukr1 · March 16, 2023, 6:09pm

Hi @AastaLLL ,

Thats great that you ran the example without observing major spikes. The spikes can be induced by external processes. When running the same external process alongside the detect script, we are observing that JP51 is more suspectable to spikes than JP46, this can make the difference between acceptable spikes and unacceptable spikes.

For example, before you run the detect.py script again, run jtop on another terminal.
The plots below show the spikes without jtop running (left side) and the spikes with jtop running (right side). The JP51 spikes with jtop running (bottom right) reach >200ms above average latency. The JP46 spikes with jtop running (top right) reach ~50ms above average latency.

So my question is, why is JP51 more susceptible to major spikes when running an external process? We have experimented with isolating the cpu which improved the spikes but we suspect processes can still interfere in the hardware side especially if they are using the GPU.

Standard deviation for jp46 is 8ms whereas the standard deviation for jp51 is 54ms.

AastaLLL · March 21, 2023, 6:31am

Hi,

We need to reproduce this issue to know the reason why JetPack 5.1 has longer spikes.

Is the external process jtop?
Do you mean we can reproduce this issue with the above sample along with jtop?

Thanks.

abdo.babukr1 · March 21, 2023, 3:33pm

Yes you can reproduce this with jtop as described above.

AastaLLL · March 22, 2023, 8:21am

Hi,

Confirmed that we can reproduce the issue with jtop run concurrently.

We are discussing this with our internal team.
Will share more information with you later.

Thanks.

hardik3 · June 12, 2023, 7:47am

Hello,

We are also facing the simillar issue at our end. We were using Deepstream 6.0 and Jetpack 4.6 to to do inference using yolov8m, and able to get good performance for 2 cameras with negligible latency.

When we migrated to Jetpack 5.1 and Deepstream 6.2, we observed significant latency on the exact same pipeline with yolov8.

AastaLLL · June 29, 2023, 5:43am

Hi,

Do you meet the same issue?
This topic can get good performance on JetPack 5 but hit the longer latency occasionality.

Thanks.

system · July 26, 2023, 2:28am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High detection latency on Jetpack 5.1 Jetson Xavier NX pytorch	2	424	March 29, 2023
Performance issues when upgrading to JetPack 5 Jetson Xavier NX jetpack , performance	12	323	October 23, 2024
Jetpack Vision for Deeplearning issue Jetson AGX Xavier	7	573	October 18, 2021
JetPack 3.1 — L4T R28.1 released for Jetson TX1/TX2 Jetson TX2	42	8405	October 20, 2017
High latency in rtspout stream in DS-5.1 DeepStream SDK gstreamer , ffmpeg	6	913	October 12, 2021
Pytorch with jetpack 4.2 works slowly than 3.3 Jetson TX2	6	1397	October 18, 2021
Nvidia jetson detectnet increasing latency Jetson Nano jetson-inference , ai	9	1723	October 15, 2021
Object Detection Performance Jetson Tx2 slower than expected Jetson TX2	22	14799	October 18, 2021
Performance of Tensorflow (1.5) on Jetson TX2 slower than expected Jetson TX2	3	2814	October 18, 2021
Jetpack 3.3 trt inference time is better than Jetpack 4.4 trt inference time Jetson TX1	5	554	September 30, 2020

High detection latency on Jetpack 5.1

Related topics