Timing Issue with TensorRT Model on Jetson Orin Nano Using TensorFlow Framework

Hello,

I have encountered a puzzling issue while benchmarking the inference time of a TensorRT model running on the Jetson Orin Nano. I converted the model using the TensorFlow framework and ran it with the following Python code:

import tensorflow as tf
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import matplotlib.pyplot as plt
import time

saved_model_dir = "tensorRT-model-FP16"
model = tf.saved_model.load(saved_model_dir)
infer = model.signatures['serving_default']
image = np.random.random((1, 1200, 1920, 1)).astype(np.float32)
image = tf.convert_to_tensor(image)

# Warmup phase
for i in range(100):
    outputs = infer(inputs=image)

# Benchmark phase
benchmark_runs = 1000
start_event = cuda.Event()
end_event = cuda.Event()
timings = []

for i in range(benchmark_runs):
    # time.sleep(0.01)  # Uncommenting this changes the behavior
    start_event.record()
    outputs = infer(inputs=image)
    end_event.record()
    end_event.synchronize()
    start_event.synchronize()
    elapsed_time = start_event.time_till(end_event)
    timings.append(elapsed_time)

print("Average inference time:", np.mean(timings))

plt.plot(np.arange(len(timings)), timings)
plt.savefig("diagram.png")

The issue arises when I uncomment the time.sleep(0.01) line in the benchmarking loop:

  1. Without time.sleep(0.01): The average inference time (np.mean(timings)) is approximately 9 ms.
  2. With time.sleep(0.01): The average inference time drops drastically to about 0.5 ms
    I plotted the timing distributions for both cases, and the diagrams are attached below:

An additional interesting observation is that when I use real data in converter.build(input_fn=my_input_fn) instead of random images to optimize my model using the tf.experimental.tensorrt.Converter library, the execution timing plot looks something like this (Without time.sleep(0.01)):

Why does adding a time.sleep reduce the reported inference time? Is this an artifact of the CUDA event timing mechanism, or does it relate to TensorRT’s execution pipeline and synchronization? Could it also be due to a thermal issue on my Jetson device or hardware limitations? Interestingly, when I use a 1080 Ti GPU, the execution time remains consistent regardless. Any insights or recommendations for achieving accurate timing measurements would be greatly appreciated

Note: I created the TensorRT model using tf.experimental.tensorrt.Converter with precision_mode set to FP16. However, when I use Nvidia’s own TensorRT, it consistently runs in 17.5 milliseconds.

You can download the model from this link:
tensorRT-model-FP16.zip (331.0 KB)

Hi,

It looks more like the Python compiler has done some optimization when the sleep is added.
When you using the real input, have you verified if both cases have valid output?

Thanks.

Thank you for your prompt response.
Yes, I validated the output, and it was fine.
If the optimization is from Python compiler itself, why doesn’t this issue occur on the 1080 Ti, which consistently takes a fixed time to run the model?

Additionally, simply adding time.sleep doesn’t cause this behavior.
Instead of using time.sleep, I loaded an image, which made my model run fast due to the gap introduced between each run of the model.
It seems that adding a delay alone makes my model run faster on the Jetson.

Hi,

Thanks for the info.

We need to give it a try to get more info about the issue.
Will provide more info to you later.

Thanks.

Hi,

Could you share which TensorFlow package you use?
Do you use our prebuilt, which is available in the link below?

Moreover, are you able to test if this issue depends on the TensorFlow library?
For example, changing to other inference frameworks (like TensorRT)?

Thanks.

I installed this TensorFlow version from your source:
https://developer.download.nvidia.com/compute/redist/jp/v61/tensorflow/tensorflow-2.16.1+nv24.08-cp310-cp310-linux_aarch64.whl.
When I use TensorRT, this problem does not exist. The timing remains constant under all conditions.
All the versioning is correct, and I can use CUDA with the TensorFlow library.
I think my problem is very similar to the issue described here:
Strange jumping results on FPS and inference time

Hi,

If you don’t meet the same issue with TensorRT.
It’s recommended to contact the TensorFlow team as well since this issue might relate to some sync-up / scheduling mechanism inside the TensorFlow.

Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.