Unable to create tensorRT engine file for yolov8s-seg model in ds7.1

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 7.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 10.3.0
**• NVIDIA GPU Driver Version (valid for GPU only)**535.161.08
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi,
I’m trying to create tensorRT engine files from yolo-8s-seg onnx file while changing the input size to 1920x1088, the same process works in deepstream 6.3, but fails with deepstream 7.1 (also fails in 7.0), the process goes like this:
first I’m using the utils/export_yoloV8_seg.py from the repository: GitHub - marcoslucianops/DeepStream-Yolo-Seg: NVIDIA DeepStream SDK 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 implementation for YOLO-Segmentation models
like this:
python utils/export_yoloV8_seg.py -w yolov8s-seg.pt -s 1088 1920
this succeed both in deepstream 6.3 and in 7.1 (I’m doing it in a docker based on: nvcr.io/nvidia/deepstream:6.3-gc-triton-devel in 6.3 and in docker based on: nvcr.io/nvidia/deepstream:7.1-triton-multiarch)
then using trtexec I’m trying to create the engine like this:
trtexec --onnx=yolov8s-seg_1088_1920.onnx --saveEngine=yolov8s-seg_1088_1920.onnx-fp32-1088_1920-batch-1.engine
this command succeeds in 6.3 but fails in 7.1 with this message:

[12/08/2024-17:34:07] [E] Error[1]: [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[12/08/2024-17:34:07] [W] [TRT] Requested amount of GPU memory (805441351680 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[12/08/2024-17:34:07] [E] Error[1]: IBuilder::buildSerializedNetwork: Error Code 1: Myelin ([tunable_graph.cpp:create:117] autotuning: User allocator error allocating 805441351680-byte buffer)
[12/08/2024-17:34:07] [E] Engine could not be created from network
[12/08/2024-17:34:07] [E] Building engine failed
[12/08/2024-17:34:07] [E] Failed to create engine from model or file.
[12/08/2024-17:34:07] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100300] # trtexec --onnx=yolov8s-seg_1088_1920.onnx --saveEngine=yolov8s-seg_1088_1920.onnx-fp32-1088_1920-batch-1.engine

thanks

Could you attach this pt file so that we can try that on our side?

yolov8s-seg.pt.zip (21.0 MB)
attached zip file of yolov8-seg.pt
this is the md5 of the pt file:
3ce94748243e201a29a9c11587777c59

thanks

Could you attach how to deploy the environment of this script(export_yoloV8_seg.py) ?

these are the requirements:

certifi==2024.8.30
charset-normalizer==3.4.0
contourpy==1.3.1
cycler==0.12.1
filelock==3.16.1
fonttools==4.55.2
fsspec==2024.10.0
idna==3.10
Jinja2==3.1.4
kiwisolver==1.4.7
MarkupSafe==3.0.2
matplotlib==3.9.3
mpmath==1.3.0
networkx==3.4.2
numpy==2.2.0
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
onnx==1.17.0
opencv-python==4.10.0.84
packaging==24.2
pandas==2.2.3
pillow==11.0.0
protobuf==5.29.1
psutil==6.1.0
py-cpuinfo==9.0.0
pyparsing==3.2.0
python-dateutil==2.9.0.post0
pytz==2024.2
PyYAML==6.0.2
requests==2.32.3
scipy==1.14.1
seaborn==0.13.2
six==1.17.0
sympy==1.13.1
torch==2.5.1
torchvision==0.20.1
tqdm==4.67.1
triton==3.1.0
typing_extensions==4.12.2
tzdata==2024.2
ultralytics==8.3.48
ultralytics-thop==2.0.13
urllib3==2.2.3

To address the out-of-memory error you’re encountering while creating a TensorRT engine from a YOLOv8 ONNX model in DeepStream 7.1, let’s go through various potential causes of this issue and suggest solutions:

Possible Causes of Out-of-Memory Error

  1. Insufficient GPU Memory:
  • The requested memory for the YOLOv8 model exceeds the available GPU memory, leading to the out-of-memory error.
  1. Model Complexity:
  • YOLOv8 may have a high model complexity that uses extensive memory during engine building and inference.
  1. Memory Management Changes:
  • Differences in memory management between TensorRT versions, especially between DeepStream 6.3 and 7.1, can result in more stringent memory requirements.
  1. Unused Resources:
  • Other applications or processes might be consuming GPU memory, leaving insufficient resources for your TensorRT engine build.

Suggested Solutions

  1. Reduce Batch Size:
  • Lower the batch size when invoking trtexec. For example, instead of batch=1, try creating the engine with a smaller batch size or test with explicit batch size adjustments.
trtexec --onnx=yolov8s-seg_1088_1920.onnx --saveEngine=yolov8s-seg_1088_1920.onnx-fp32-1088_1920-batch=1
  1. Mixed Precision:
  • Utilize mixed precision (FP16) to reduce memory requirements significantly. Modify your command to use FP16 support to save GPU memory.
trtexec --onnx=yolov8s-seg_1088_1920.onnx --saveEngine=yolov8s-seg_1088_1920.onnx-fp16-1088_1920-batch=1
  1. Optimize the Model:
  • Before converting, optimize your ONNX model by minimizing or fusing layers where possible. You can use tools such as ONNX Graph Optimization Toolkit to help reduce the size/complexity.
  1. Profile GPU Memory Usage:
  • Use tools like nvidia-smi to monitor GPU memory usage while running the TensorRT commands. This will help identify if other processes are consuming memory.
watch -n 1 nvidia-smi
  1. Upgrade Drivers and TensorRT:
  • Ensure you’re on the latest NVIDIA driver and TensorRT version. Compatibility improvements and updated memory management features might resolve issues.
  • You may want to check if there are newer versions than 8.6.1 that could yield better results.
  1. Clean Up GPU Resources:
  • Restart the system or the container to ensure all previous memory allocations are cleared if you’ve run multiple tests or failed builds.
  1. Use Profiling Tools:
  • Use TensorRT profiling tools to gain insights into memory usage during engine building. Profiling may reveal unoptimized sections of your model or excessive memory requirements.
  1. Consider Increasing GPU Memory:
  • If the hardware allows, running on a different GPU with more memory could also be a solution if you are constrained by memory limits.

By following these steps and suggestions, you should be able to mitigate the out-of-memory error and successfully create the TensorRT engine file from your YOLOv8 ONNX model while utilizing DeepStream 7.1.