Infer time after conversion and ram usage


Hi, I’m trying to convert a model to trt engine. Issue is that the engine is taking more time that the pytorch code. There is an approx. 750% increase in time taken. This is when I run on an nx. When I try to run on nano, I run into memory issues and the process is killed. (Built separate engines for separate devices).
Please suggest me how I could improve the time and memory consumption so that I could successfully use this model on a nano.


TensorRT Version:
GPU Type: Jetson Nano
L4T 32.6.1
CUDA Version: 10.2.300
CUDNN Version:
Operating System + Version: Ubuntu 18.04.6 LTS
Python Version (if applicable): 3.6

Relevant Files

Attached Link. (I’ve attached the one after running polygraphy command)

Steps To Reproduce

polygraphy surgeon sanitize model.onnx --fold-constants --output model_folded.onnx
trtexec --onnx=model_folded.onnx --fp16 --workspace=1500 --verbose --saveEngine=model.trt


Have you tried running model on ONNX-Runtime and are you facing the same issue ?
Also, we recommend you to please share trtexec --verbose logs output.

Thank you.

I ran unto similar issues when using onnx-runtime. I was hoping that TensorRT would help me to side step these issues. Currently, I have the logs for when I was building the int8 version. Please inform if you need me to run it again for the other logs.

Thanks, your help is always appreciated.
trt_verbose_output.txt (14.9 MB)

Please note that the verbose log that I posted is for NX and not NANO. A couple more things, the 750% increase in time that I mentioned was reported for NX and not NANO as it ran out of memory on NANO. Also, the int8 engine gave the same time for inference as the fp16 engine.
Any improvements you suggest would be highly appreciated.

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging

Sorry for the delayed response. I had already shared the onnx file as well as the verbose output. I’m not sure which script you are asking for me to share.
The increase in time was also there when I checked with trtexec command. (trtexec --loadEngine=engine.trt --iterations=100)


Are you facing the same issue with Latest TensorRT version 8.2.2.

Thank you.

I’m using:
NVIDIA Jetson Xavier NX (Developer Kit Version)
L4T 32.6.1 [ JetPack 4.6 ]
Ubuntu 18.04.6 LTS
Kernel Version: 4.9.253-tegra
CUDA 10.2.300
CUDA Architecture: 7.2
OpenCV version: 4.4.0
OpenCV Cuda: YES
Vision Works:
VPI: ii libnvvpi1 1.1.12 arm64 NVIDIA Vision Programming Interface library
Vulcan: 1.2.70

Can you guide me on how to convert to TensorRT 8.2.2?
ps: nano is the device on which I finally want to run it. nano’s environment is also shared in the first post.