Inconsistent TensorRT Inference Time on Jetson Xavier NX

abhirup.sinha · March 3, 2025, 1:01pm

I am writing to seek assistance regarding an issue I am experiencing with TensorRT inference times on my NVIDIA Jetson Xavier NX board. When running an engine model, the inference time occasionally increases dramatically. For instance, the inference time sometimes shows 25 milliseconds , but subsequent runs can show 45 milliseconds or even 65 milliseconds.

Observations and Details:

Device : NVIDIA Jetson Xavier NX
Issue Description :
- Initial inference time: ~25 milliseconds
- Subsequent inference times: ~45 milliseconds to ~65 milliseconds
Model : Object detection yolov5.engine model
Power Supply : 5V, 4A
Jetpack Version 5.1.2
Temperature : Maximum observed temperature is 40-42°C
Cooling Measures : The board is equipped with a heat sink and a fan
Max clock frequency (sudo jetson_clocks), all 6 cores are active.

Y-T-G · March 3, 2025, 3:22pm

Does the time increase when you wait longer between inference?

abhirup.sinha · March 4, 2025, 5:37am

Did not get you ? could you please elaborate you statement little bit

Y-T-G · March 4, 2025, 10:27am

Is the latency shorter when you make inference one after another quickly, and longer if you wait some time before making next inference?

abhirup.sinha · March 4, 2025, 11:40am

longer when wait some time before doing another inference

Y-T-G · March 4, 2025, 11:52am

It occurs with TensorRT. Not sure about the reason.

github.com/ultralytics/ultralytics

How to solve the problem that when the time between two inferences becomes longer, the inference time also becomes longer?

opened 03:36AM - 24 Jan 25 UTC

1250890838

question segment exports

### Search before asking - [x] I have searched the Ultralytics YOLO [issues](ht…tps://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/orgs/ultralytics/discussions) and found no similar questions. ### Question I have a trt model file, which was trained by others. it is a yolov8 segmentation model. I am responsible for deploying it on c++. I tried running the example code. The main body is a loop running preprocessing-inference-postprocessing. I tried running about 100 images and the inference time was about 1.5-2.0ms, but when I added to this loop a Sleep function, The parameter is 1000ms, pause for one second after each processing. At this time, most of the inference time will be 3-4ms, but sometimes it will be 1 or 2ms. I want to know how to optimize it.Is tensorrt a targeted optimization for continuous inference? If I haven't provided enough information, please point it out. ### Additional Before adding an interval of one second ![Image](https://github.com/user-attachments/assets/fc07edd4-f4ff-4fc1-b8ac-1fda29ad32a3) After adding an interval of one second ![Image](https://github.com/user-attachments/assets/69d6af6b-6e3c-4c00-a9b9-41a390aa4942) _No response_

Topic		Replies	Views
Inference time becomes longer when doing non-continuous fp16 or int8 inference TensorRT tensorrt , jetson-inference	33	3340	March 30, 2023
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3557	October 23, 2020
Inference time not stable for Jetson Nano with TensorRT Jetson Nano tensorrt	4	692	October 18, 2021
Optimize Inference Time of yolov2 model on Jetson Nano NX Jetson Xavier NX tensorrt , tensorflow	2	756	June 15, 2022
Jetson AGX Xavier shows unstable inference time Jetson AGX Xavier tensorrt , jetson-inference	6	708	October 18, 2021
Inference time on Jetson Xavier compared with local host PC？ Jetson AGX Xavier	8	791	October 18, 2021
Inference time increases in for loop TensorRT	2	382	February 6, 2023
Inference time changes after training TensorRT tensorrt	5	590	September 25, 2020
TensorRT frames processing speed increases with increase in number of frames Jetson Nano tensorrt	4	837	October 15, 2021
Why my inference time is so long when using trtexec - FP16? Jetson TX2 jetson-inference	4	1977	October 18, 2021

Inconsistent TensorRT Inference Time on Jetson Xavier NX

Related topics