Hi, I’m trying to convert a model to trt engine. Issue is that the engine is taking more time that the pytorch code. There is an approx. 750% increase in time taken. This is when I run on an nx. When I try to run on nano, I run into memory issues and the process is killed. (Built separate engines for separate devices).
Please suggest me how I could improve the time and memory consumption so that I could successfully use this model on a nano.
Environment
TensorRT Version: 8.0.1.6 GPU Type: Jetson Nano
L4T 32.6.1 CUDA Version: 10.2.300 CUDNN Version: 8.2.1.32 Operating System + Version: Ubuntu 18.04.6 LTS Python Version (if applicable): 3.6
Relevant Files
Attached Link. (I’ve attached the one after running polygraphy command)
I ran unto similar issues when using onnx-runtime. I was hoping that TensorRT would help me to side step these issues. Currently, I have the logs for when I was building the int8 version. Please inform if you need me to run it again for the other logs.
Please note that the verbose log that I posted is for NX and not NANO. A couple more things, the 750% increase in time that I mentioned was reported for NX and not NANO as it ran out of memory on NANO. Also, the int8 engine gave the same time for inference as the fp16 engine.
Any improvements you suggest would be highly appreciated.
Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
validating your model with the below snippet
check_model.py
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command. https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!
Sorry for the delayed response. I had already shared the onnx file as well as the verbose output. I’m not sure which script you are asking for me to share.
The increase in time was also there when I checked with trtexec command. (trtexec --loadEngine=engine.trt --iterations=100)
Can you guide me on how to convert to TensorRT 8.2.2?
ps: nano is the device on which I finally want to run it. nano’s environment is also shared in the first post.
Hi,
I this is the repo. The trt_segmentation.py is the file that I use for inference.
Note, I had to modify dpt/vit.py file to export the onnx file. The unflatten function was replaced by view.
Sorry for the delayed response. We could not reproduce the issue.
We recommend you to try increasing workspace. If you still face this issue, we recommend you to please reach out Jetson Nano related forum.
Can you share the onnx file/tensorRT engine file you generated for the above model?
Did you follow the same steps mentioned in this thread? Also, I used 3500 workspace on a NX and 1500 on a nano. Giving more just usually kills the program.
Sharing TensorRT engine file may not help you. As TensorRT engine is platform specific it was built and not portable accross other platforms.
We tried using the following in the post. We couldn’t successfully setup previous repo. It would be great if we receive a simple script to run. However, this issue looks specific to Jetson platform. We recommend to post on the Jetson related forum with issue repro to get better help.