Issues with Using TensorRT on Jetson Orin Nano

Hello Community,

TensorRT: 10.3.0
NVIDIA GPU: NVIDIA Jetson Orin Nano 8GB - AVerMedia 131S SuperL4T
Nvidia driver version: L4T R36.4.4
cuDNN Version: 9.3.0.75
Operating System: Ubuntu 22.04.5 LTS
Python Version: Python 3.10.12

I am tying to run a computer vision model based on segmentation on a Jetson Orin nano industrial kit attcahed with 4 Realsense Cameras. I am using an ONNX model with an input of 1088Hx1920W at FP16. I am facing a memory restraint which means that the TensorRT is failing during engine build because it cannot find any kernel/tactic that both supports that node and fits in the memory currently available:

2026-03-10 14:10:14.842034592 [W:onnxruntime:Default, tensorrt_execution_provider.h:92 log] [2026-03-10 13:10:14 WARNING] Detected layernorm nodes in FP16.
2026-03-10 14:10:14.842109632 [W:onnxruntime:Default, tensorrt_execution_provider.h:92 log] [2026-03-10 13:10:14 WARNING] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy.
2026-03-10 14:10:22.426030400 [W:onnxruntime:Default, tensorrt_execution_provider.h:92 log] [2026-03-10 13:10:22 WARNING] Tactic Device request: 161MB Available: 144MB. Device memory is insufficient to use tactic.
2026-03-10 14:10:22.437048032 [W:onnxruntime:Default, tensorrt_execution_provider.h:92 log] [2026-03-10 13:10:22 WARNING] UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 169205760 detected for tactic 0x0000000000000000.

This same issue does not occur when I rely on teh CUDA execution provider when initilalising my model. How can I use teh TensorRT Execution provider to work on teh computer vision project, keeping teh same resolution for teh input images (1088Hx1920W), since I have already trained a lot of images, & it would be difficult to create teh ONNX weights file from scratch.

Hi @toshanmajumdar,

This is a classic Out-Of-Memory issue.

To answer your underlying question: This is happening because of the TensorRT build phase, combined with the Jetson’s Unified Memory.

Unlike the CUDA Execution Provider which just runs standard, pre-compiled kernels, TensorRT performs an aggressive “auto-tuning” phase when you first initialize it. It tests hundreds of mathematical algorithms (tactics) to find the fastest one for your specific Jetson. Testing these tactics on massive 1088x1920 layers requires huge chunks of temporary workspace memory.

Since the Orin Nano shares its 8GB of RAM with Ubuntu and the CPU, you just don’t have enough free contiguous memory for TensorRT to test its tactics (as seen in your log: request: 161MB Available: 144MB), causing the build to fail.

Since you need to keep the 1088x1920 resolution, the goal is to successfully get through the memory-heavy engine build once. Inference itself uses much less memory.

Here are the best ways to push the build process through:

1. Pre-build the Engine using trtexec

Loading Python, ONNX Runtime, and your script takes up valuable RAM. You can bypass this overhead by building the TensorRT engine directly from the command line using NVIDIA’s native tool.
Run this in your terminal:

/usr/src/tensorrt/bin/trtexec --onnx=your_model.onnx --saveEngine=your_model.engine --fp16

Once it successfully builds your_model.engine, you can point your ONNX Runtime/TensorRT code directly to this pre-built engine file. This skips the heavy auto-tuning phase entirely at runtime.

2. Maximize Available Memory (Headless Mode)

The Ubuntu GUI eats up about 1GB to 1.5GB of your shared RAM. Temporarily disable it to give TensorRT maximum breathing room during the engine build.

  • Disable GUI: sudo systemctl set-default multi-user.target and reboot.
  • Build your engine.
  • Restore GUI: sudo systemctl set-default graphical.target and reboot.

3. Increase Your Swap Space

Since the Orin Nano relies on shared memory, setting up a large swap file (8GB to 16GB) on your NVMe acts as a safety net during the heavy build phase. The build will take longer as it pages to the SSD, but it will prevent the OOM crash.

4. Address the ONNX Opset Warning

Your logs show a warning about FP16 layernorm nodes potentially overflowing. This limits the memory-efficient tactics TensorRT is allowed to use. If possible, re-export your model to ONNX using opset 17 or higher. This allows TensorRT to use the INormalizationLayer kernel, which is much more stable and memory-efficient.

Try the trtexec route with the GUI disabled first that usually gets these high-resolution models over the finish line on 8GB Jetsons !

If not I’ll refer it to Jetson Community for their look.

Thank You !

Hi @athkumar

Thank you for your response.

I was succesful in prebuilding teh TensorRT engine on teh Jetson & then running it fro my vision project from a Docker container. However during startup I get teh follwoing warning about using TensorRT with different models/hardware. Is there any way to make it more reliable to be deployed in a fleet of multiple Jetsons ?

###################################### PROGRAM RUN START ######################################################
2026-04-02 09:36:56,613 [INFO] Loaded config from: /app/config_files/aruco_parameters.json
2026-04-02 09:36:56,613 [INFO] Config | fps=15 | queue=2 | model=(1920x1088) | method=frame_after_request
2026-04-02 09:36:56,614 [INFO] RS_RECORD_BAG=0
2026-04-02 09:36:57,083 [INFO] [cleanup_on_start] detected: Intel RealSense D435I | serial=215222073903 | firmware=5.13.0.50
2026-04-02 09:36:57,083 [INFO] [cleanup_on_start] detected: Intel RealSense D435I | serial=233722070556 | firmware=5.17.0.10
2026-04-02 09:36:57,084 [INFO] [cleanup_on_start] detected: Intel RealSense D435I | serial=147122073791 | firmware=5.13.0.50
2026-04-02 09:36:57,084 [INFO] [cleanup_on_start] detected: Intel RealSense D435I | serial=313522072245 | firmware=5.13.0.50
2026-04-02 09:36:57,165 [INFO] …Starting Deploy Mode…
2026-04-02 09:36:57,166 [INFO] Saving detection images to: deploy_results
2026-04-02 09:36:57,167 [INFO] Loading model from: onnx_model/weights/best_static_1088hx1920w_fp16_argmax.engine
Available ORT providers: [‘TensorrtExecutionProvider’, ‘CUDAExecutionProvider’, ‘CPUExecutionProvider’]
[04/02/2026-09:36:57] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[TRT] input tensor: input, shape=(1, 3, 1088, 1920)
[TRT] output tensor: argmax_output_casted, shape=(1, 1088, 1920)
Using providers: [‘TensorRT_Engine’]
[ORT OUTPUT] shape: (1, 1088, 1920) dtype: float16
2026-04-02 09:36:58,350 [INFO] Warmup done.

Hi @toshanmajumdar,

Awesome to hear the pre-build worked out! Now that you’re moving toward fleet deployment, this is exactly the right question to be asking.

What That Warning Means

If you are seeing the warning: [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

.it means TensorRT has noticed a hardware discrepancy. Engine plans aren’t really portable, even between two devices that look identical on a spec sheet. When trtexec built your engine, it ran an auto-tuning phase that profiled hundreds of CUDA kernel tactics against your specific Jetson’s SM count, memory bandwidth, and clock profile, and then baked the winning tactics into the .engine file.

At deserialization time, TRT compares a fingerprint of the deploy device against the plan:

  • Compute capability mismatch: Hard failure; it refuses to load.
  • Compute capability matches but device model differs: This triggers the warning you are seeing.

Even subtle differences (Orin Nano 4GB vs 8GB vs Super, board revision, JetPack/BSP version, memory vendor SKU) are enough to trip the second case. Compute capability tells TRT “this kernel will run” but not “this kernel is optimal.”

Recommended Path for Fleets: Build-on-Target with Caching

Don’t ship a pre-built engine in your container. Instead, ship the ONNX model and have the container build the engine the first time it runs on each device.

Here is the standard pattern for TRT in production:

  1. Mount a persistent volume into the container (e.g., /data/engines/).
  2. On startup, your entrypoint checks for model.engine in that directory.
  3. If missing, run: /usr/src/tensorrt/bin/trtexec --onnx=your_model.onnx --saveEngine=/data/engines/model.engine --fp16
  4. If present, load and run.

You pay the ~5–15 min build cost once per device, the warning goes away, and every Jetson gets a perfectly-tuned engine.

⚠️ Heads up: The first-boot build on each device will hit the exact same OOM ceiling we worked through earlier! Make sure your provisioning flow disables the GUI and adds an 8–16GB swap file before the first trtexec run, otherwise the build will crash on the fresh devices.

Using ONNX Runtime TensorRT EP

If you’re using the ONNX Runtime TensorRT EP, it has this build-and-cache feature built in. You just need to set the following provider options:

python

trt_engine_cache_enable = True
trt_engine_cache_path = "/data/engines/"

ORT will handle the build-on-first-run and the cache-lookup-thereafter automatically!

Good luck with the rollout!

Thank You !