Jetson Nano Python 3.7 version for Tensorrt

Description

I was trying to use Yolov8 on a Jetson Nano, but I just read that the minimum version of python necessary for Yolov8 is 3.7.
But the last Jetpack available for Jetson Nano is 4.6.3 which includes Python 3.6 and can’t be upgraded because the Tensorrt version only works in Python 3.6.
I can obtain the .engine file for Yolov8 from my regular computer, but It must be the same version of tensorrt to work, and I can’ export in the same version because that version of tensorrt (8.2.1.8) only works with python 3.6 and I need python3. 7 to export an engine from a Yolov8 model.

Nvidia is planning to upgrade the Tensorrt from the next Jetpack versions? or I can use other program to obtain the .engine external of the Yolov8 library?.

Thank you so much for reading
Greetings

Environment

TensorRT Version: 8.2.1.8
GPU Type: Jetson Nano
Nvidia Driver Version: 4.6.3
CUDA Version: 10.2
CUDNN Version: 8.2.1
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.10.0
Baremetal or Container (if container which image + tag):

Relevant Files

Jetpack 4.6.3: https://developer.nvidia.com/jetpack-sdk-463
Tensorrt Python compatibility: Support Matrix :: NVIDIA Deep Learning TensorRT Documentation
Yolov8 github repository: GitHub - ultralytics/ultralytics: NEW - YOLOv8 🚀 in PyTorch > ONNX > CoreML > TFLite

Hi,

We are moving this post to the Jetson Nano forum to get better help.

Thank you

Hi,

TensorRT python binding is open-sourced.
You can build it for 3.7 on your own.

Thanks.

Ok, I followed the tutorial to install the TensorRT version with sucess following this tutorial:

But I forgot to mention that I also need Torch and Torchvision for Python 3.7.
I’m now trying this tutorial because I didn’t find any .whl file for PyTorch with CUDA 10.2 and Python 3.7:

But is running out of RAM memory even using the LXDE environment.
I can train and test using external computers. However, in order to export the.engine model and use it in the Jetson,
Is the same architecture (arm64) required, or only the same TensorRT, CUDA, and cuDNN version?
Thank you so much for answering.

Hi,

An alternative is to convert the PyTorch model into ONNX format.
The ONNX format can be converted into TensorRT with the trtexec binary directly.

Thanks.

Thanks,
I tried to convert an onnx file to engine using trtexec with this line.

/usr/src/tensorrt/bin/trtexec --onnx=yolov8_model.onnx --shapes=data:1x3x640x640 --saveEngine=output.engine

but i recieve this error.

[03/07/2023-09:28:20] [E] [TRT] /model.22/dfl/Reshape: volume mismatch. Input dimensions [1,72,8400] have volume 604800 and output dimensions [1,4,16,8400] have volume 537600.
[03/07/2023-09:28:20] [E] [TRT] /model.22/dfl/Reshape: volume mismatch. Input dimensions [1,72,8400] have volume 604800 and output dimensions [1,4,16,8400] have volume 537600.
[03/07/2023-09:28:20] [E] [TRT] /model.22/dfl/Reshape: volume mismatch. Input dimensions [1,72,8400] have volume 604800 and output dimensions [1,4,16,8400] have volume 537600.
[03/07/2023-09:28:20] [E] [TRT] /model.22/dfl/Reshape: volume mismatch. Input dimensions [1,72,8400] have volume 604800 and output dimensions [1,4,16,8400] have volume 537600.
[03/07/2023-09:28:20] [E] [TRT] /model.22/dfl/Reshape: volume mismatch. Input dimensions [1,72,8400] have volume 604800 and output dimensions [1,4,16,8400] have volume 537600.
[03/07/2023-09:28:20] [E] [TRT] /model.22/dfl/Reshape: volume mismatch. Input dimensions [1,72,8400] have volume 604800 and output dimensions [1,4,16,8400] have volume 537600.
[03/07/2023-09:28:20] [E] [TRT] /model.22/dfl/Reshape: volume mismatch. Input dimensions [1,72,8400] have volume 604800 and output dimensions [1,4,16,8400] have volume 537600.
ERROR: onnx2trt_utils.cpp:188 In function convertAxis:
[8] Assertion failed: axis >= 0 && axis < nbDims
[03/07/2023-09:28:20] [E] Failed to parse onnx file
[03/07/2023-09:28:20] [E] Parsing model failed
[03/07/2023-09:28:20] [E] Engine creation failed
[03/07/2023-09:28:20] [E] Engine set up failed

I don’t know well what is causing it, but I tried using the simplfied model using onnxsim and I also obtain the same error.

Thanks

Hi,

Based on the output log:

Reshape: volume mismatch. Input dimensions [1,72,8400] have volume 604800 and output dimensions [1,4,16,8400] have volume 537600.

It seems that your model has some issues with the definition.
Could you run the ONNX model with other frameworks, ex onnxruntime?

Thanks.

Hi,
I don’t know what you mean by run the onnx model, I convert the .pt model into onnx and then use a USB to move it to the Jeson and run the trtexec. In order to obtain the .engine file.

I’m using this code to export onnx model, is using the yolov8 library:

from ultralytics import YOLO
model = YOLO("yolov8n_AEGIS.pt")
success = model.export(format="onnx", device=0)  # export the model to onnx format

And the input size is correct:

model = onnx.load("yolov8n_AEGIS.onnx")
print([[d.dim_value for d in _input.type.tensor_type.shape.dim] for _input in model.graph.input])
>[[1, 3, 640, 640]]

Here is the pt file without traning:
yolov8n.pt (6.2 MB)

Hi,

Please also attach the ONNX model.
Thanks.

Hi,
Here it is the not trained model, but is the same for both. Directlly extracted from that code.
Thanks
yolov8n.onnx (12.2 MB)

Hi,

We can run your model with JetPack 4.6.3.
Please try our command and test it again.

$ /usr/src/tensorrt/bin/trtexec --onnx=yolov8n.onnx 
&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=yolov8n.onnx
...
[03/15/2023-15:20:07] [I] 
[03/15/2023-15:20:07] [I] === Trace details ===
[03/15/2023-15:20:07] [I] Trace averages of 10 runs:
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3103 ms - Host latency: 12.5592 ms (end to end 12.5684 ms, enqueue 3.4141 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3089 ms - Host latency: 12.5581 ms (end to end 12.5707 ms, enqueue 3.2359 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3176 ms - Host latency: 12.5658 ms (end to end 12.5755 ms, enqueue 3.17767 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3133 ms - Host latency: 12.5615 ms (end to end 12.5723 ms, enqueue 3.15106 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3174 ms - Host latency: 12.5663 ms (end to end 12.5772 ms, enqueue 3.2137 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.313 ms - Host latency: 12.5623 ms (end to end 12.5723 ms, enqueue 3.09373 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3168 ms - Host latency: 12.5658 ms (end to end 12.5759 ms, enqueue 3.10371 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3071 ms - Host latency: 12.5553 ms (end to end 12.5655 ms, enqueue 3.09354 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3154 ms - Host latency: 12.5646 ms (end to end 12.5739 ms, enqueue 3.12062 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.317 ms - Host latency: 12.5661 ms (end to end 12.5771 ms, enqueue 3.16703 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3064 ms - Host latency: 12.5543 ms (end to end 12.5644 ms, enqueue 3.13046 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3166 ms - Host latency: 12.5645 ms (end to end 12.5752 ms, enqueue 3.08904 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3103 ms - Host latency: 12.5584 ms (end to end 12.5689 ms, enqueue 3.10193 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3223 ms - Host latency: 12.5711 ms (end to end 12.5802 ms, enqueue 3.06229 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3124 ms - Host latency: 12.5607 ms (end to end 12.5708 ms, enqueue 3.06226 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3215 ms - Host latency: 12.5708 ms (end to end 12.5816 ms, enqueue 2.94758 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3249 ms - Host latency: 12.5744 ms (end to end 12.585 ms, enqueue 3.04656 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3104 ms - Host latency: 12.5584 ms (end to end 12.5685 ms, enqueue 3.04551 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3115 ms - Host latency: 12.5603 ms (end to end 12.5721 ms, enqueue 3.01611 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3172 ms - Host latency: 12.5652 ms (end to end 12.5762 ms, enqueue 3.00647 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3176 ms - Host latency: 12.5664 ms (end to end 12.5785 ms, enqueue 2.98127 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3139 ms - Host latency: 12.5638 ms (end to end 12.575 ms, enqueue 2.95164 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3098 ms - Host latency: 12.5583 ms (end to end 12.5689 ms, enqueue 3.04917 ms)
[03/15/2023-15:20:07] [I] Average on 10 runs - GPU latency: 12.3193 ms - Host latency: 12.5671 ms (end to end 12.5771 ms, enqueue 2.9856 ms)
[03/15/2023-15:20:07] [I] 
[03/15/2023-15:20:07] [I] === Performance summary ===
[03/15/2023-15:20:07] [I] Throughput: 79.5293 qps
[03/15/2023-15:20:07] [I] Latency: min = 12.521 ms, max = 12.6267 ms, mean = 12.5634 ms, median = 12.5608 ms, percentile(99%) = 12.6165 ms
[03/15/2023-15:20:07] [I] End-to-End Host Latency: min = 12.536 ms, max = 12.6343 ms, mean = 12.5739 ms, median = 12.5721 ms, percentile(99%) = 12.6284 ms
[03/15/2023-15:20:07] [I] Enqueue Time: min = 1.94434 ms, max = 3.63049 ms, mean = 3.09275 ms, median = 3.07898 ms, percentile(99%) = 3.46201 ms
[03/15/2023-15:20:07] [I] H2D Latency: min = 0.146362 ms, max = 0.155029 ms, mean = 0.14786 ms, median = 0.147888 ms, percentile(99%) = 0.149414 ms
[03/15/2023-15:20:07] [I] GPU Compute Time: min = 12.2688 ms, max = 12.3779 ms, mean = 12.3148 ms, median = 12.3128 ms, percentile(99%) = 12.3662 ms
[03/15/2023-15:20:07] [I] D2H Latency: min = 0.0856934 ms, max = 0.105591 ms, mean = 0.100774 ms, median = 0.100952 ms, percentile(99%) = 0.104614 ms
[03/15/2023-15:20:07] [I] Total Host Walltime: 3.03033 s
[03/15/2023-15:20:07] [I] Total GPU Compute Time: 2.96787 s
[03/15/2023-15:20:07] [I] Explanations of the performance metrics are printed in the verbose logs.
[03/15/2023-15:20:07] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=yolov8n.onnx

Thanks.

Hi,
In our program we are using Jetpack 4.5.1 but we are considering to upgrade to the newest, I tried the conversion on Jetpack 4.5.1 instead of 4.6.3, I will try the 4.6.3 as you said.
Sorry for the confusion, but when you said that I could use trtexec I just discard to use the newer Jetpack version.
I just tried it in 4.6.3, and it generated the file!!!, but do you think the conversion could also work in 4.5.1? the problem commented before is obtained with that jetpack version.
Thanks

Hi Bro,

out of curiosity, are you following any tutorial to implement Yolov8 ? I’m after something better than Yolov4, ready to go if possible :).
thanks

Hi,

Based on the log, there are some new features that can make your model work.
So please use JetPack 4.6.3 for inference.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.