Timing Issue with TensorRT Model on Jetson Orin Nano Using TensorFlow Framework

Arshia_nb · December 4, 2024, 12:03pm

Hello,

I have encountered a puzzling issue while benchmarking the inference time of a TensorRT model running on the Jetson Orin Nano. I converted the model using the TensorFlow framework and ran it with the following Python code:

import tensorflow as tf
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import matplotlib.pyplot as plt
import time

saved_model_dir = "tensorRT-model-FP16"
model = tf.saved_model.load(saved_model_dir)
infer = model.signatures['serving_default']
image = np.random.random((1, 1200, 1920, 1)).astype(np.float32)
image = tf.convert_to_tensor(image)

# Warmup phase
for i in range(100):
    outputs = infer(inputs=image)

# Benchmark phase
benchmark_runs = 1000
start_event = cuda.Event()
end_event = cuda.Event()
timings = []

for i in range(benchmark_runs):
    # time.sleep(0.01)  # Uncommenting this changes the behavior
    start_event.record()
    outputs = infer(inputs=image)
    end_event.record()
    end_event.synchronize()
    start_event.synchronize()
    elapsed_time = start_event.time_till(end_event)
    timings.append(elapsed_time)

print("Average inference time:", np.mean(timings))

plt.plot(np.arange(len(timings)), timings)
plt.savefig("diagram.png")

The issue arises when I uncomment the time.sleep(0.01) line in the benchmarking loop:

Without time.sleep(0.01): The average inference time (np.mean(timings)) is approximately 9 ms.
With time.sleep(0.01): The average inference time drops drastically to about 0.5 ms
I plotted the timing distributions for both cases, and the diagrams are attached below:

With time.sleep(0.01):

comment640×480 19.1 KB
Without time.sleep(0.01):

un comment640×480 15.8 KB

An additional interesting observation is that when I use real data in converter.build(input_fn=my_input_fn) instead of random images to optimize my model using the tf.experimental.tensorrt.Converter library, the execution timing plot looks something like this (Without time.sleep(0.01)):

Why does adding a time.sleep reduce the reported inference time? Is this an artifact of the CUDA event timing mechanism, or does it relate to TensorRT’s execution pipeline and synchronization? Could it also be due to a thermal issue on my Jetson device or hardware limitations? Interestingly, when I use a 1080 Ti GPU, the execution time remains consistent regardless. Any insights or recommendations for achieving accurate timing measurements would be greatly appreciated

Note: I created the TensorRT model using tf.experimental.tensorrt.Converter with precision_mode set to FP16. However, when I use Nvidia’s own TensorRT, it consistently runs in 17.5 milliseconds.

You can download the model from this link:
tensorRT-model-FP16.zip (331.0 KB)

AastaLLL · December 5, 2024, 7:20am

Hi,

It looks more like the Python compiler has done some optimization when the sleep is added.
When you using the real input, have you verified if both cases have valid output?

Thanks.

Arshia_nb · December 5, 2024, 8:06am

Thank you for your prompt response.
Yes, I validated the output, and it was fine.
If the optimization is from Python compiler itself, why doesn’t this issue occur on the 1080 Ti, which consistently takes a fixed time to run the model?

Arshia_nb · December 5, 2024, 8:14am

Additionally, simply adding time.sleep doesn’t cause this behavior.
Instead of using time.sleep, I loaded an image, which made my model run fast due to the gap introduced between each run of the model.
It seems that adding a delay alone makes my model run faster on the Jetson.

AastaLLL · December 9, 2024, 10:09am

Hi,

Thanks for the info.

We need to give it a try to get more info about the issue.
Will provide more info to you later.

Thanks.

AastaLLL · December 11, 2024, 7:24am

Hi,

Could you share which TensorFlow package you use?
Do you use our prebuilt, which is available in the link below?

Moreover, are you able to test if this issue depends on the TensorFlow library?
For example, changing to other inference frameworks (like TensorRT)?

Thanks.

Arshia_nb · December 17, 2024, 7:21am

I installed this TensorFlow version from your source:
https://developer.download.nvidia.com/compute/redist/jp/v61/tensorflow/tensorflow-2.16.1+nv24.08-cp310-cp310-linux_aarch64.whl.
When I use TensorRT, this problem does not exist. The timing remains constant under all conditions.
All the versioning is correct, and I can use CUDA with the TensorFlow library.
I think my problem is very similar to the issue described here:
Strange jumping results on FPS and inference time

AastaLLL · December 19, 2024, 3:29am

Hi,

If you don’t meet the same issue with TensorRT.
It’s recommended to contact the TensorFlow team as well since this issue might relate to some sync-up / scheduling mechanism inside the TensorFlow.

Thanks.

Topic		Replies	Views
Question about inference speed Jetson Nano	2	667	October 18, 2021
Inference time not stable for Jetson Nano with TensorRT Jetson Nano tensorrt	4	779	October 18, 2021
Optimize Tensorflow with Tensor RT to improve inference timing Jetson Nano	2	711	October 18, 2021
Taking longer for inferencing even after TensorRT optimization TensorRT	3	481	May 28, 2020
TensorRT inference is slower than tensorflow model TensorRT	1	1013	June 28, 2019
Issues with long term execution of computer vision project using Jetson Orin Jetson AGX Orin tensorrt , camera , jetson-inference , cudnn	5	77	May 7, 2026
Randomness at inference time Jetson Nano jetson-inference	6	693	May 19, 2022
Inconsistent TensorRT Inference Time on Jetson Xavier NX TensorRT	5	208	March 4, 2025
Reasons for unexpected inference speed Jetson Nano jetson-inference	4	140	July 16, 2025
Performance difference between Jetpack and TensorRT versions Jetson Nano tensorrt , jetson-inference	7	615	June 21, 2023

Timing Issue with TensorRT Model on Jetson Orin Nano Using TensorFlow Framework

Related topics