Incorrect inference results - onnx/pytorch/tensorrt/c++ on Xavier AGX


I have a trained Pytorch model for object detection that takes in bird’s eye view (BEV) of a lidar point cloud and produces keypoint masks for cars in the BEV.

Everything runs fine in Pytorch and I export the model to onnx.
I run inference again using onnxruntime and this is also fine.

src.7z (12.2 KB)
torch-onnx-standalone (1).zip (53.7 MB)
bevdetnet_3x320x480.onnx (57.3 MB)

Attached is a standalone project that includes a sample BEV input as npy file as well as some reference outputs which are correct. It also has the onnx model inside.

Now I wish to run inference in C++ using onnx-tensorrt on the Xavier AGX dev kit.
I am able to build the engine using trtexec and run inference. But, all the results seem to be wrong.

Also attached the complete C++ source code for this.

Please help me understand what’s going wrong.

Best Regards
src.7z (12.2 KB)
torch-onnx-standalone (1).zip (53.7 MB)


Thanks for reporting this to us.
Just want to clarify first, do you use TensorRT API or ONNXRuntime with TRT accelerated?



Is this still an issue that needs our help?


I use Tensorrt C++ APIs.
I did a pure onnxruntime-CPU test just to make sure that the onnx export works fine. It has nothing to do with tensorrt.


I followed the quick start semantic segmentation example. I think the source of confusion mainly lies in copying data to and from the GPU. My input is a 3 channel BEV image of size 480x320 (sizes and number of channels can vary later, fixed batch size of 1 is fine for now). The pixel values are floats which are then normalized [0, 1] by simply dividing by max element channel-wise. I use Eigen matrices in row-major form to make up my input. A class constructs an Eigen tensor (array of 3 matrices), as you can see in the attached code.

The expected outputs are again all floats of sizes - 480x320x4, 480x320x37, 480x320x3, 480x320x2

A simple example of how to do this without going through too many complicated inter-conversions and type-castings would help me a lot.

Sorry for the delayed response and please let me know if I can help anymore in debugging this issue.

Best Regards


Is there any update to this yet?

Best Regards


It’s recommended to check our /usr/src/tensorrt/samples/sampleMNIST sample first.

Although it reads images from the .pgm file, it converts the data into the floating type and subtracts the mean.
Please check the sample for some ideas for your use case.



Thank for the hint. But, I think I followed the sample onnx mnist example and then also the quick start guide. The present code I have shared is basically a copy of the quick start guide with only necessary changes.

Therefore, I would be really grateful if you could suggest me some other alternatives or please have a look at the code I shared.

However, recently, I rebuilt the trt engine using trtexec with verbose mode on.
I noticed that I get messages like “ConvTransposed2D does not have an equivalent in this tactic, skipping…

Could this be the cause of some problems?
My network uses transposed convolutions and sigmoid activations in between.

I am really lost here since it’s impossible for me to debug what’s going wrong once the input passes over to tensorrt.
Please help me me figure it out.

Best Regards


Sorry for the late update.
If the message is a warning, it should not affect the accuracy but only some performance impact.

We are going to check this internally.
So if we compare the TensorRT and ONNXRuntime results, we should be able to reproduce this issue.
Is this correct?



Yes, exactly!! The tensorrt result and onnx runtime results should match.

Best Regards

Some additional inputs:
Network uses

  • convolution with strides > 1 and dilation > 1
  • transposed conv
  • relu
  • batchnorm
  • sigmoid
  • channel-wise multiplication

Another point - the network was trained on 2 GPUs using DataParallel. Does that make a difference?


We can get a very similar result with the attached source.
Would you mind giving it a try? (1.2 KB) (2.2 KB)

Although the sample is python-based, TensorRT should output the identical result between C++ and Python.
Maybe there are some differences in the data pre-processing between C++ and Python?


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.