ONNX to TensoRT conversion failing with error: "each train expected to have at most one ShapeHostToDeviceNode"

brad.steiner1 · July 12, 2023, 12:21am

Description

ONNX file was generated from PyTorch Retinanet and then folded using polygraphy. Then when running /usr/src/tensorrt/bin/trtexec --onnx=folded.onnx --saveEngine=model.engine, this is the error:

...
[07/11/2023-17:14:31] [I] [TRT] [GpuLayer] MYELIN: {ForeignNode[onnx::Equal_3245.../model/Concat_92]}
[07/11/2023-17:14:31] [I] [TRT] [GpuLayer] TRAIN_STATION: [trainStation3]
[07/11/2023-17:14:31] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 913, GPU 5828 (MiB)
[07/11/2023-17:14:31] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 914, GPU 5828 (MiB)
[07/11/2023-17:14:31] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[07/11/2023-17:14:31] [E] Error[2]: [injectImplicitPadding.cpp::grabShapeHostToDeviceNodes::419] Error Code 2: Internal Error (Assertion !holder failed. each train expected to have at most one ShapeHostToDeviceNode)
[07/11/2023-17:14:31] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[07/11/2023-17:14:31] [E] Engine could not be created from network
[07/11/2023-17:14:31] [E] Building engine failed
[07/11/2023-17:14:31] [E] Failed to create engine from model or file.
[07/11/2023-17:14:31] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=folded.onnx --saveEngine=model.engine

Any ideas on what is causing this?

Environment

TensorRT Version: 8.5
GPU Type: Jetson Xavier
Nvidia Driver Version:
CUDA Version: 11.4
CUDNN Version:
Operating System + Version: Ubuntu 20
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.13
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

Using that onnx, run /usr/src/tensorrt/bin/trtexec --onnx=folded.onnx --saveEngine=model.engine

AakankshaS · July 12, 2023, 5:37am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

spolisetty · July 12, 2023, 6:31am

Hi,

We recommend that you please try the latest TensorRT version 8.6.1. Please share with us your complete verbose logs if you still face the issue.

You can also try TensorRT | NVIDIA NGC container for easy setup.

Thank you.

brad.steiner1 · July 12, 2023, 2:39pm

Is it possible to use TensorRT 8.6.1 on Jetson Xavier with Jetpack 5.1? I thought I saw some indications that this is not possible

spolisetty · July 13, 2023, 7:17pm

Hi,

We are moving this post to the Jetson Xavier forum to get better help on the above query.

Thank you.

AastaLLL · July 14, 2023, 6:57am

Hi,

TensorRT 8.6 is not available for Jetson yet.
Let’s check if the error comes from ONNX or TensorRT first.

Have you run the ONNX file with other frameworks like ONNXRuntime?
If not, please do so.

Thanks.

brad.steiner1 · July 17, 2023, 10:30pm

I removed NMS from the Pytorch model and that seems to have resolved this error. The engine is now fully building. However, I am getting a seg fault when trying to run the example inference code found here: https://github.com/NVIDIA/TensorRT/blob/main/quickstart/SemanticSegmentation/tutorial-runtime.ipynb

Traceback (most recent call last):
  File "run.py", line 85, in <module>
    infer(engine, input_file, output_file)
  File "run.py", line 67, in infer
    output_memory = cuda.mem_alloc(output_buffer.nbytes)
pycuda._driver.LogicError: cuMemAlloc failed: invalid argument
terminate called after throwing an instance of 'nvinfer1::plugin::CudnnError'
  what():  std::exception
Aborted (core dumped)

The error seems to stem from the size of one of the bindings being 0. Any idea why? Is there a more up to date inference example I should be using?

AastaLLL · July 27, 2023, 5:55am

Hi,

Could you try if the engine file can work with trtexec tool first?
Below is another inference example for your reference:
https://elinux.org/Jetson/L4T/TRT_Customized_Example#OpenCV_with_PLAN_model

Thanks.

brad.steiner1 · July 27, 2023, 3:45pm

It does work with trtexec tool. I was able to get if working with that inference code by modifying the model so its output was not dynamically shaped

system · August 16, 2023, 1:19am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I am trying to convert the ONNX SSD mobilnet v2 model into TensorRT Engine. I am getting the below error Jetson AGX Xavier tensorrt , jetson	8	788	December 8, 2021
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3523	April 20, 2022
Mod operator unsupported in TensorRT 8.4.1 (included w/ Jetpack 5.0.2) TensorRT jetpack , tensorrt , cuda , jetson-inference , onnx	5	1529	January 2, 2023
Converting FCN8-ResNet18 from Pytorch to TensorRT for inference on Jetson Nano TensorRT tensorrt , jetson-inference , pytorch , python , onnx	3	2229	October 12, 2021
FAILED TensorRT.trtexec TensorRT	1	2440	October 4, 2021
Could not parse ONNX model from file TensorRT	9	3555	January 24, 2024
TensorRT with Jetpack 4.6.2 on Xavier NX emmc 16gb version Jetson Xavier NX tensorrt	12	1765	June 15, 2022
Importing a ONNX model for performing an inference using TensorRT Jetson Nano tensorrt	5	2776	October 15, 2021
NX tensorrt Jetson Xavier NX tensorrt	4	900	May 5, 2022
ONNX -> TRT Error Code 1 and 2: Cask (isConsistent) and Internal Error (Assertion enginePtr != nullptr failed.) TensorRT tensorrt	5	921	July 26, 2022