Hi, I’m trying to convert a model to trt engine. Issue is that the engine is taking more time that the pytorch code. There is an approx. 750% increase in time taken. This is when I run on an nx. When I try to run on nano, I run into memory issues and the process is killed. (Built separate engines for separate devices).
Please suggest me how I could improve the time and memory consumption so that I could successfully use this model on a nano.
TensorRT Version: 126.96.36.199 GPU Type: Jetson Nano
L4T 32.6.1 CUDA Version: 10.2.300 CUDNN Version: 188.8.131.52 Operating System + Version: Ubuntu 18.04.6 LTS Python Version (if applicable): 3.6
Attached Link. (I’ve attached the one after running polygraphy command)
I ran unto similar issues when using onnx-runtime. I was hoping that TensorRT would help me to side step these issues. Currently, I have the logs for when I was building the int8 version. Please inform if you need me to run it again for the other logs.
Please note that the verbose log that I posted is for NX and not NANO. A couple more things, the 750% increase in time that I mentioned was reported for NX and not NANO as it ran out of memory on NANO. Also, the int8 engine gave the same time for inference as the fp16 engine.
Any improvements you suggest would be highly appreciated.
Sorry for the delayed response. I had already shared the onnx file as well as the verbose output. I’m not sure which script you are asking for me to share.
The increase in time was also there when I checked with trtexec command. (trtexec --loadEngine=engine.trt --iterations=100)
Sorry for the delayed response. We could not reproduce the issue.
We recommend you to try increasing workspace. If you still face this issue, we recommend you to please reach out Jetson Nano related forum.
Can you share the onnx file/tensorRT engine file you generated for the above model?
Did you follow the same steps mentioned in this thread? Also, I used 3500 workspace on a NX and 1500 on a nano. Giving more just usually kills the program.
Sharing TensorRT engine file may not help you. As TensorRT engine is platform specific it was built and not portable accross other platforms.
We tried using the following in the post. We couldn’t successfully setup previous repo. It would be great if we receive a simple script to run. However, this issue looks specific to Jetson platform. We recommend to post on the Jetson related forum with issue repro to get better help.