Infer time after conversion and ram usage

romilaggarwal611 · December 8, 2021, 10:36am

Description

Hi, I’m trying to convert a model to trt engine. Issue is that the engine is taking more time that the pytorch code. There is an approx. 750% increase in time taken. This is when I run on an nx. When I try to run on nano, I run into memory issues and the process is killed. (Built separate engines for separate devices).
Please suggest me how I could improve the time and memory consumption so that I could successfully use this model on a nano.

Environment

TensorRT Version: 8.0.1.6
GPU Type: Jetson Nano
L4T 32.6.1
CUDA Version: 10.2.300
CUDNN Version: 8.2.1.32
Operating System + Version: Ubuntu 18.04.6 LTS
Python Version (if applicable): 3.6

Relevant Files

Attached Link. (I’ve attached the one after running polygraphy command)

Steps To Reproduce

polygraphy surgeon sanitize model.onnx --fold-constants --output model_folded.onnx
trtexec --onnx=model_folded.onnx --fp16 --workspace=1500 --verbose --saveEngine=model.trt

spolisetty · December 8, 2021, 11:36am

Hi,

Have you tried running model on ONNX-Runtime and are you facing the same issue ?
Also, we recommend you to please share trtexec --verbose logs output.

Thank you.

romilaggarwal611 · December 8, 2021, 12:16pm

I ran unto similar issues when using onnx-runtime. I was hoping that TensorRT would help me to side step these issues. Currently, I have the logs for when I was building the int8 version. Please inform if you need me to run it again for the other logs.

Thanks, your help is always appreciated.
trt_verbose_output.txt (14.9 MB)

romilaggarwal611 · December 8, 2021, 1:06pm

Please note that the verbose log that I posted is for NX and not NANO. A couple more things, the 750% increase in time that I mentioned was reported for NX and not NANO as it ran out of memory on NANO. Also, the int8 engine gave the same time for inference as the fp16 engine.
Any improvements you suggest would be highly appreciated.

NVES · December 9, 2021, 5:38am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

romilaggarwal611 · December 21, 2021, 10:59am

Sorry for the delayed response. I had already shared the onnx file as well as the verbose output. I’m not sure which script you are asking for me to share.
The increase in time was also there when I checked with trtexec command. (trtexec --loadEngine=engine.trt --iterations=100)

spolisetty · January 4, 2022, 1:41pm

Hi,

Are you facing the same issue with Latest TensorRT version 8.2.2.

Thank you.

romilaggarwal611 · January 4, 2022, 3:20pm

I’m using:
NVIDIA Jetson Xavier NX (Developer Kit Version)
L4T 32.6.1 [ JetPack 4.6 ]
Ubuntu 18.04.6 LTS
Kernel Version: 4.9.253-tegra
CUDA 10.2.300
CUDA Architecture: 7.2
OpenCV version: 4.4.0
OpenCV Cuda: YES
CUDNN: 8.2.1.32
TensorRT: 8.0.1.6
Vision Works: 1.6.0.501
VPI: ii libnvvpi1 1.1.12 arm64 NVIDIA Vision Programming Interface library
Vulcan: 1.2.70

Can you guide me on how to convert to TensorRT 8.2.2?
ps: nano is the device on which I finally want to run it. nano’s environment is also shared in the first post.

spolisetty · January 25, 2022, 6:56am

Hi,

Please share us issue repro onnx model and scrip to try from our end for better debugging.

Thank you.

romilaggarwal611 · February 2, 2022, 8:15am

Hi,
I this is the repo. The trt_segmentation.py is the file that I use for inference.
Note, I had to modify dpt/vit.py file to export the onnx file. The unflatten function was replaced by view.

spolisetty · February 15, 2022, 5:43am

Hi,

Sorry for the delayed response. We could not reproduce the issue.
We recommend you to try increasing workspace. If you still face this issue, we recommend you to please reach out Jetson Nano related forum.

Thank you.

romilaggarwal611 · February 15, 2022, 6:13am

Can you share the onnx file/tensorRT engine file you generated for the above model?
Did you follow the same steps mentioned in this thread? Also, I used 3500 workspace on a NX and 1500 on a nano. Giving more just usually kills the program.

Thanks.

spolisetty · February 15, 2022, 6:41am

Hi,

Sharing TensorRT engine file may not help you. As TensorRT engine is platform specific it was built and not portable accross other platforms.
We tried using the following in the post. We couldn’t successfully setup previous repo. It would be great if we receive a simple script to run. However, this issue looks specific to Jetson platform. We recommend to post on the Jetson related forum with issue repro to get better help.

Thank you.

Topic		Replies	Views
Tensorrt Engine use too much memory TensorRT tensorrt	1	1586	December 13, 2021
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3589	April 20, 2022
Converting tf model on jetson tx2 is slow TensorRT	14	1215	June 24, 2020
ONNX Model Int64 Weights TensorRT	12	13113	February 17, 2024
Building a engine takes too long TensorRT	13	3193	December 8, 2022
ONNX Model and Tensorrt Engine gives different output for parseq model TensorRT onnx	4	1174	July 17, 2023
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2810	October 18, 2021
Inference time on jetson nano Jetson AGX Xavier tensorrt , cuda , kernel , jetson-inference	2	936	May 30, 2022
Input length mismatch (onnx conversion to .trt) TensorRT tensorrt , onnx	4	1254	July 13, 2022
Resize error Jetson Xavier NX tensorrt , tensorflow	5	1285	September 12, 2021

Infer time after conversion and ram usage

Description

Environment

Relevant Files

Steps To Reproduce

check_model.py

Related topics