TensorRT Engine Produces Incorrect YOLO Detections vs. ONNX on Jetson Orin Nano

I’m trying to speed up my YOLO inference on a Jetson Orin Nano by converting an ONNX model to a TensorRT engine, but the detection outputs don’t match. The parser of onnx to engine is ok and i don’t have any error.

What Works: ONNX + OpenCV (C++)

  • Model: yolov9s_simplified.onnx
  • CPP Code: test_onnx.zip (1.7 KB)

What Fails: TensorRT Engine (C++)

Engine build:

/usr/src/tensorrt/bin/trtexec
–onnx=yolov9s_simplified.onnx
–saveEngine=yolov9s_simplified.engine
| tee output_trtexec.log

output_trtexec.log (54.9 KB)

I’m looking for help to identify what’s going wrong in the TensorRT and how to fix it so that the engine produces the same YOLO detections as the ONNX version.

Thanks in advance.

*** Please note that this reply is generated by LLM automatically ***

I’d be happy to help you troubleshoot the issue with your YOLO inference on Jetson Orin Nano.

First, let’s break down the information you provided:

  1. You have an ONNX model that works fine with OpenCV C++.
  2. You converted the ONNX model to a TensorRT engine using the trt.OnnxParser API.
  3. The parser didn’t report any errors, but the TensorRT engine produces different results than the ONNX version.

To identify the issue, let’s compare the two workflows:

ONNX Workflow

  1. You load the ONNX model using OpenCV C++.
  2. You run the model on an input image using OpenCV C++.
  3. You get the expected YOLO detections.

TensorRT Workflow

  1. You convert the ONNX model to a TensorRT engine using trt.OnnxParser.
  2. You build the TensorRT engine using the trtexec command.
  3. You run the TensorRT engine on an input image using trtexec.
  4. You get different YOLO detections than the ONNX version.

Now, let’s consider some potential reasons for the discrepancy:

  1. Model conversion issues: Although the parser didn’t report any errors, there might be issues with the model conversion process. You can try using the trt.OnnxParser API with more verbose logging to see if there are any warnings or errors during the conversion process.
  2. Engine building issues: The trtexec command might not be building the engine correctly. You can try building the engine programmatically using the TensorRT API to have more control over the process.
  3. Input/output differences: Make sure that the input image and output formats are the same for both the ONNX and TensorRT workflows. You can try using the same input image and output format for both workflows to rule out any differences.
  4. Precision issues: TensorRT engines can be built with different precision modes (e.g., FP32, FP16, INT8). Make sure that the precision mode used for the TensorRT engine is the same as the one used for the ONNX model.
  5. Optimization issues: TensorRT engines can be optimized for performance, which might affect the accuracy of the model. You can try disabling optimizations or using a different optimization level to see if it affects the results.

To troubleshoot the issue, you can try the following:

  1. Use the trt.OnnxParser API with more verbose logging to see if there are any warnings or errors during the conversion process.
  2. Build the TensorRT engine programmatically using the TensorRT API to have more control over the process.
  3. Use the same input image and output format for both the ONNX and TensorRT workflows.
  4. Try different precision modes for the TensorRT engine.
  5. Disable optimizations or use a different optimization level for the TensorRT engine.

If none of these suggestions help, please provide more information about your model, input data, and TensorRT engine building process, and I’ll do my best to help you troubleshoot the issue.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Thanks for the suggestions, but I think there’s still some confusion around how I’ve been testing:

ONNX + OpenCV (C++):

Pad → blobFromImage → forward → reshape+transpose → decode → NMS → draw
Uses exactly the same input.jpg and C++ code from my first post
Works: correct boxes

TensorRT + C++:

Serialize ONNX → load engine in C++ → same pad/blob/decode/NMS code → draw
Fails: draws one giant box or nonsense coordinates

TensorRT via trtexec:

I generated input.raw from the same input.jpg (pad→resize→RGB→normalize→CHW→float32)

#!/usr/bin/env python3
import numpy as np
import cv2

# 1) Read original image
img = cv2.imread("input.jpg")
if img is None:
    raise RuntimeError("Could not open input.jpg")

# 2) Pad to square
h, w = img.shape[:2]
side = max(w, h)
sq = np.zeros((side, side, 3), dtype=np.uint8)
sq[:h, :w] = img

# 3) Resize to 640×640, BGR→RGB, normalize to [0,1]
resized = cv2.resize(sq, (640, 640))
rgb     = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
normed  = rgb.astype(np.float32) / 255.0

# 4) CHW layout and add batch dim → (1,3,640,640)
data = normed.transpose(2, 0, 1)[None, ...]

# 5) Write out as raw float32
data.tofile("input.raw")

Ran:

/usr/src/tensorrt/bin/trtexec --loadEngine=yolov9s_simplified.engine --loadInputs=images:input.raw --iterations=1 --avgRuns=10 --exportOutput=trt_output.json

Log output trtexec: trtexec_out.log (7.5 KB)
JSON Data output: trt_output.zip (717.8 KB)

Result: JSON dump of output tensors contains infinities and garbage box coords

What I’m really asking:

  • Why does the same model + same preprocessing + same post‑processing give correct detections in OpenCV+ONNX but broken ones in TensorRT?

  • Could it be the ONNX→TRT conversion, or a bug in the Detection/NMS plugin?

Any pointers on debugging the engine itself would be hugely appreciated!

Hi,

Thanks for sharing the detailed source code to check this issue.
We will give it a try and share more info with you.

In the meantime, would you mind helping us to do the following experiments?

  1. Comparing results between ONNXRuntime and TensorRT with Polygraphy?
    It will align all the configure (ex., input) and retrieve more info about the difference comes from.
    TensorRT/tools/Polygraphy at release/10.3 · NVIDIA/TensorRT · GitHub

  2. Upgrade to a newer TensorRT package.
    Since we have some accuracy-related bug fixes recently, please try this on a newer TensorRT as well.

TensorRT 10.7 GA for JetPack

Thanks.

I’ll give Polygraphy a try over the next few days to compare ONNX Runtime vs. TensorRT side-by-side and report back on what I find.

As for upgrading TensorRT, I’m unfortunately locked into v8.5.2 across roughly 400 production devices and cannot migrate at this time.

I’m happy to share the ONNX model and serialized engine if that would help with your testing. Looking forward to any feedback you can provide!

Does the TensorRT model work when loaded in Ultralytics?

Hi,

We try to reproduce this issue internally but some data is missing. (ex. onnx model, input data).
Could you share these files as well as the testing command so we can test it internally?

Thanks.

Hi AastaLLL,

I’ve packaged everything into two archives:

yolov9s_simplified.zip (48.9 MB)

yolov9s_simplified.zip Contains:

  • yolov9s_simplified.onnx
  • yolov9s_simplified.engine

input.zip (158.3 KB)

input.zip Contains:

  • input.jpg (original image)

The code is in the first post. Thank you for your support.

Best Regards.

Hi AastaLLL,

Any update or something can i share with you to reproduce??.

Best Regards.

Hi AastaLLL.

With the files uploaded is enough for test??.

Best Regards.

Just to provide a data point, I’m running Jetpack 6.2.1 on an orin nano devkit with ultralytics version 8.3.75, and I’m able to run inference on Yolo11 tensorRT models with Python code. It’s very precise.

Thanks for sharing!

In my case I need this to work on L4T 35.3.1 / JetPack 5.1.1 with TensorRT 8.5.0.2 (v8502), and upgrading isn’t an option (production fleet). Has anyone gotten YOLOv9 running on this stack? If yes, could you share the exact Ultralytics version, ONNX export flags/opset, any plugins or builder flags you used? If it’s simply not supported on TRT 8.5.x, a confirmation would also help.

Hi,

Sorry for the late update.
We are trying to reproduce this issue and get back to you.

Thanks.

Hi @AastaLLL

Thanks, no worries. I’ll wait for your results.

If you need any extra logs, tests, or traces from my setup, just let me know.

Best Regards.

Hi,

Do you have a script that can overlay the TensorRT output (trt_output.json) on the image to visualize the output?

Thanks.

Hi,

Thanks for following up. I don’t have a script to overlay trt_output.json on images—I’m only running inference exactly as in my original post (L4T 35.3.1 / JetPack 5.1.1 with TensorRT 8.5.2), with no visualization step.

On my training PC (RTX 5080) I can get comparable results between ONNX and TensorRT, but that’s a different platform/version. On my production devices I need to stay on TensorRT 8.5.2 with L4T 35.3.1 / JP 5.1.1.

Best regards.

Hi,

How do you verify the output from TensorRT?

But we agree with you that the output looks strange, so it should have some problems.
When double-checking your implementation, it seems that the input for TensorRT and OpenCV is not exactly the same.

Could you try to load the raw image to the below function for double-checking?

image.copyTo(input_image(cv::Rect(0,0,w,h)));

Thanks.