Infer time after conversion and ram usage

Description

Hi, I’m trying to convert a model to trt engine. Issue is that the engine is taking more time that the pytorch code. There is an approx. 750% increase in time taken. This is when I run on an nx. When I try to run on nano, I run into memory issues and the process is killed. (Built separate engines for separate devices).
Please suggest me how I could improve the time and memory consumption so that I could successfully use this model on a nano.

Environment

TensorRT Version: 8.0.1.6
GPU Type: Jetson Nano
L4T 32.6.1
CUDA Version: 10.2.300
CUDNN Version: 8.2.1.32
Operating System + Version: Ubuntu 18.04.6 LTS
Python Version (if applicable): 3.6

Relevant Files

Attached Link. (I’ve attached the one after running polygraphy command)

Steps To Reproduce

polygraphy surgeon sanitize model.onnx --fold-constants --output model_folded.onnx
trtexec --onnx=model_folded.onnx --fp16 --workspace=1500 --verbose --saveEngine=model.trt

Hi,

Have you tried running model on ONNX-Runtime and are you facing the same issue ?
Also, we recommend you to please share trtexec --verbose logs output.

Thank you.

I ran unto similar issues when using onnx-runtime. I was hoping that TensorRT would help me to side step these issues. Currently, I have the logs for when I was building the int8 version. Please inform if you need me to run it again for the other logs.

Thanks, your help is always appreciated.
trt_verbose_output.txt (14.9 MB)

Please note that the verbose log that I posted is for NX and not NANO. A couple more things, the 750% increase in time that I mentioned was reported for NX and not NANO as it ran out of memory on NANO. Also, the int8 engine gave the same time for inference as the fp16 engine.
Any improvements you suggest would be highly appreciated.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Sorry for the delayed response. I had already shared the onnx file as well as the verbose output. I’m not sure which script you are asking for me to share.
The increase in time was also there when I checked with trtexec command. (trtexec --loadEngine=engine.trt --iterations=100)

Hi,

Are you facing the same issue with Latest TensorRT version 8.2.2.

Thank you.

I’m using:
NVIDIA Jetson Xavier NX (Developer Kit Version)
L4T 32.6.1 [ JetPack 4.6 ]
Ubuntu 18.04.6 LTS
Kernel Version: 4.9.253-tegra
CUDA 10.2.300
CUDA Architecture: 7.2
OpenCV version: 4.4.0
OpenCV Cuda: YES
CUDNN: 8.2.1.32
TensorRT: 8.0.1.6
Vision Works: 1.6.0.501
VPI: ii libnvvpi1 1.1.12 arm64 NVIDIA Vision Programming Interface library
Vulcan: 1.2.70

Can you guide me on how to convert to TensorRT 8.2.2?
ps: nano is the device on which I finally want to run it. nano’s environment is also shared in the first post.

Hi,

Please share us issue repro onnx model and scrip to try from our end for better debugging.

Thank you.

Hi,
I this is the repo. The trt_segmentation.py is the file that I use for inference.
Note, I had to modify dpt/vit.py file to export the onnx file. The unflatten function was replaced by view.

Hi,

Sorry for the delayed response. We could not reproduce the issue.
We recommend you to try increasing workspace. If you still face this issue, we recommend you to please reach out Jetson Nano related forum.

Thank you.

Can you share the onnx file/tensorRT engine file you generated for the above model?
Did you follow the same steps mentioned in this thread? Also, I used 3500 workspace on a NX and 1500 on a nano. Giving more just usually kills the program.

Thanks.

Hi,

Sharing TensorRT engine file may not help you. As TensorRT engine is platform specific it was built and not portable accross other platforms.
We tried using the following in the post. We couldn’t successfully setup previous repo. It would be great if we receive a simple script to run. However, this issue looks specific to Jetson platform. We recommend to post on the Jetson related forum with issue repro to get better help.

Thank you.