Unable to create tensorRT engine file for yolov8s-seg model in ds7.1

md44 · December 8, 2024, 3:45pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 7.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 10.3.0
**• NVIDIA GPU Driver Version (valid for GPU only)**535.161.08
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi,
I’m trying to create tensorRT engine files from yolo-8s-seg onnx file while changing the input size to 1920x1088, the same process works in deepstream 6.3, but fails with deepstream 7.1 (also fails in 7.0), the process goes like this:
first I’m using the utils/export_yoloV8_seg.py from the repository: GitHub - marcoslucianops/DeepStream-Yolo-Seg: NVIDIA DeepStream SDK 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 implementation for YOLO-Segmentation models
like this:
python utils/export_yoloV8_seg.py -w yolov8s-seg.pt -s 1088 1920
this succeed both in deepstream 6.3 and in 7.1 (I’m doing it in a docker based on: nvcr.io/nvidia/deepstream:6.3-gc-triton-devel in 6.3 and in docker based on: nvcr.io/nvidia/deepstream:7.1-triton-multiarch)
then using trtexec I’m trying to create the engine like this:
trtexec --onnx=yolov8s-seg_1088_1920.onnx --saveEngine=yolov8s-seg_1088_1920.onnx-fp32-1088_1920-batch-1.engine
this command succeeds in 6.3 but fails in 7.1 with this message:

[12/08/2024-17:34:07] [E] Error[1]: [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[12/08/2024-17:34:07] [W] [TRT] Requested amount of GPU memory (805441351680 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[12/08/2024-17:34:07] [E] Error[1]: IBuilder::buildSerializedNetwork: Error Code 1: Myelin ([tunable_graph.cpp:create:117] autotuning: User allocator error allocating 805441351680-byte buffer)
[12/08/2024-17:34:07] [E] Engine could not be created from network
[12/08/2024-17:34:07] [E] Building engine failed
[12/08/2024-17:34:07] [E] Failed to create engine from model or file.
[12/08/2024-17:34:07] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100300] # trtexec --onnx=yolov8s-seg_1088_1920.onnx --saveEngine=yolov8s-seg_1088_1920.onnx-fp32-1088_1920-batch-1.engine

thanks

yuweiw · December 9, 2024, 8:27am

Could you attach this pt file so that we can try that on our side?

md44 · December 9, 2024, 8:46am

yolov8s-seg.pt.zip (21.0 MB)
attached zip file of yolov8-seg.pt
this is the md5 of the pt file:
3ce94748243e201a29a9c11587777c59

thanks

yuweiw · December 9, 2024, 10:43am

Could you attach how to deploy the environment of this script(export_yoloV8_seg.py) ?

md44 · December 9, 2024, 1:47pm

these are the requirements:

certifi==2024.8.30
charset-normalizer==3.4.0
contourpy==1.3.1
cycler==0.12.1
filelock==3.16.1
fonttools==4.55.2
fsspec==2024.10.0
idna==3.10
Jinja2==3.1.4
kiwisolver==1.4.7
MarkupSafe==3.0.2
matplotlib==3.9.3
mpmath==1.3.0
networkx==3.4.2
numpy==2.2.0
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
onnx==1.17.0
opencv-python==4.10.0.84
packaging==24.2
pandas==2.2.3
pillow==11.0.0
protobuf==5.29.1
psutil==6.1.0
py-cpuinfo==9.0.0
pyparsing==3.2.0
python-dateutil==2.9.0.post0
pytz==2024.2
PyYAML==6.0.2
requests==2.32.3
scipy==1.14.1
seaborn==0.13.2
six==1.17.0
sympy==1.13.1
torch==2.5.1
torchvision==0.20.1
tqdm==4.67.1
triton==3.1.0
typing_extensions==4.12.2
tzdata==2024.2
ultralytics==8.3.48
ultralytics-thop==2.0.13
urllib3==2.2.3

yuweiw · December 17, 2024, 3:14am

To address the out-of-memory error you’re encountering while creating a TensorRT engine from a YOLOv8 ONNX model in DeepStream 7.1, let’s go through various potential causes of this issue and suggest solutions:

Possible Causes of Out-of-Memory Error

Insufficient GPU Memory:

The requested memory for the YOLOv8 model exceeds the available GPU memory, leading to the out-of-memory error.

Model Complexity:

YOLOv8 may have a high model complexity that uses extensive memory during engine building and inference.

Memory Management Changes:

Differences in memory management between TensorRT versions, especially between DeepStream 6.3 and 7.1, can result in more stringent memory requirements.

Unused Resources:

Other applications or processes might be consuming GPU memory, leaving insufficient resources for your TensorRT engine build.

Suggested Solutions

Reduce Batch Size:

Lower the batch size when invoking trtexec. For example, instead of batch=1, try creating the engine with a smaller batch size or test with explicit batch size adjustments.

trtexec --onnx=yolov8s-seg_1088_1920.onnx --saveEngine=yolov8s-seg_1088_1920.onnx-fp32-1088_1920-batch=1

Mixed Precision:

Utilize mixed precision (FP16) to reduce memory requirements significantly. Modify your command to use FP16 support to save GPU memory.

trtexec --onnx=yolov8s-seg_1088_1920.onnx --saveEngine=yolov8s-seg_1088_1920.onnx-fp16-1088_1920-batch=1

Optimize the Model:

Before converting, optimize your ONNX model by minimizing or fusing layers where possible. You can use tools such as ONNX Graph Optimization Toolkit to help reduce the size/complexity.

Profile GPU Memory Usage:

Use tools like nvidia-smi to monitor GPU memory usage while running the TensorRT commands. This will help identify if other processes are consuming memory.

watch -n 1 nvidia-smi

Upgrade Drivers and TensorRT:

Ensure you’re on the latest NVIDIA driver and TensorRT version. Compatibility improvements and updated memory management features might resolve issues.
You may want to check if there are newer versions than 8.6.1 that could yield better results.

Clean Up GPU Resources:

Restart the system or the container to ensure all previous memory allocations are cleared if you’ve run multiple tests or failed builds.

Use Profiling Tools:

Use TensorRT profiling tools to gain insights into memory usage during engine building. Profiling may reveal unoptimized sections of your model or excessive memory requirements.

Consider Increasing GPU Memory:

If the hardware allows, running on a different GPU with more memory could also be a solution if you are constrained by memory limits.

By following these steps and suggestions, you should be able to mitigate the out-of-memory error and successfully create the TensorRT engine file from your YOLOv8 ONNX model while utilizing DeepStream 7.1.

Topic		Replies	Views
Unable to build model engine for INT8 yolov8m quantized using tensorrt model optimizer TensorRT jetson , deepstream	5	196	September 24, 2024
Engine file and calib.table not saved in DeepStream DeepStream SDK tensorrt	6	878	December 25, 2022
Process Killed when Generating a TensorRT Engine for the ViT models DeepStream SDK tensorrt , jetson-inference , deepstream	11	98	October 31, 2024
How to deploy attention on deepstream6.0? DeepStream SDK	3	172	December 5, 2023
Profile inference time of each layer for .engine model to know where is bottleneck in Deepstream? DeepStream SDK	17	709	June 19, 2023
Generate engine using onnx2trt, The engine was used to call deepstream, but an error was reported DeepStream SDK onnx	5	210	June 11, 2024
Tenssorrt INT8 precision engine build failed for the models having custom layer (BatchedNMSDynamic_TRT) TensorRT	11	1891	June 29, 2021
Trtexec model conversion crashed at insufficient gpu memory Jetson Orin NX jetson-inference	27	4706	January 11, 2023
TensorRT engine giving wrong/different output in DeepStream DeepStream SDK	26	4080	February 22, 2022
Generation of tensorrt engine using onnx model file TensorRT tensorrt , gstreamer , deepstream	3	541	December 21, 2023

Unable to create tensorRT engine file for yolov8s-seg model in ds7.1

Possible Causes of Out-of-Memory Error

Suggested Solutions

Related topics