Converting yolov4 onnx model to TensorRT for multi batch input

sebastianvvj9c · November 17, 2023, 5:14pm

Description

I’m looking to convert a yolov4 model from Onnx model zoo to tensorflow using TensorRT for use in Deepstream.
I’m looking to use this for streaming from multiple sources and so I want to convert it to use a batch size >1. I also have a question about the process:

Do model .engine files need to be created on the device they are intended to be used on? We are looking to deploy on Jetson NX’s and are using a build server, but this post (GitHub - isarsoft/yolov4-triton-tensorrt: This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server) seems to imply you can build on an x86 server and push out to a jetson device

Environment

TensorRT Version: v8001
GPU Type: Jetson NX
Nvidia Driver Version: r32.6.1
CUDA Version: 10.2
CUDNN Version:
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): r32.6.1-samples

Steps To Reproduce

Download yolov4 onnx file from onnx model zoo
Run trtexec for multiple batch size (from GitHub - isarsoft/yolov4-triton-tensorrt: This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server)
Test with:
/usr/src/tensorrt/bin/trtexec --loadEngine=yolov4.engine --batch=4 --iterations=100 --avgRuns=10 --dumpProfile --dumpOutput --useCudaGraph

and I see the output:
/usr/src/tensorrt/bin/trtexec --loadEngine=yolov4.engine --batch=4 --iterations=100 --avgRuns=10 --dumpProfile --dumpOutput --useCudaGraph
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4.engine --batch=4 --iterations=100 --avgRuns=10 --dumpProfile --dumpOutput --useCudaGraph --plugins=liblayerplugin.so
[11/17/2023-11:56:18] [I] === Model Options ===
[11/17/2023-11:56:18] [I] Format: *
[11/17/2023-11:56:18] [I] Model:
[11/17/2023-11:56:18] [I] Output:
[11/17/2023-11:56:18] [I] === Build Options ===
[11/17/2023-11:56:18] [I] Max batch: 4
[11/17/2023-11:56:18] [I] Workspace: 16 MiB
[11/17/2023-11:56:18] [I] minTiming: 1
[11/17/2023-11:56:18] [I] avgTiming: 8
[11/17/2023-11:56:18] [I] Precision: FP32
[11/17/2023-11:56:18] [I] Calibration:
[11/17/2023-11:56:18] [I] Refit: Disabled
[11/17/2023-11:56:18] [I] Sparsity: Disabled
[11/17/2023-11:56:18] [I] Safe mode: Disabled
[11/17/2023-11:56:18] [I] Restricted mode: Disabled
[11/17/2023-11:56:18] [I] Save engine:
[11/17/2023-11:56:18] [I] Load engine: yolov4.engine
[11/17/2023-11:56:18] [I] NVTX verbosity: 0
[11/17/2023-11:56:18] [I] Tactic sources: Using default tactic sources
[11/17/2023-11:56:18] [I] timingCacheMode: local
[11/17/2023-11:56:18] [I] timingCacheFile:
[11/17/2023-11:56:18] [I] Input(s)s format: fp32:CHW
[11/17/2023-11:56:18] [I] Output(s)s format: fp32:CHW
[11/17/2023-11:56:18] [I] Input build shapes: model
[11/17/2023-11:56:18] [I] Input calibration shapes: model
[11/17/2023-11:56:18] [I] === System Options ===
[11/17/2023-11:56:18] [I] Device: 0
[11/17/2023-11:56:18] [I] DLACore:
[11/17/2023-11:56:18] [I] Plugins: liblayerplugin.so
[11/17/2023-11:56:18] [I] === Inference Options ===
[11/17/2023-11:56:18] [I] Batch: 4
[11/17/2023-11:56:18] [I] Input inference shapes: model
[11/17/2023-11:56:18] [I] Iterations: 100
[11/17/2023-11:56:18] [I] Duration: 3s (+ 200ms warm up)
[11/17/2023-11:56:18] [I] Sleep time: 0ms
[11/17/2023-11:56:18] [I] Streams: 1
[11/17/2023-11:56:18] [I] ExposeDMA: Disabled
[11/17/2023-11:56:18] [I] Data transfers: Enabled
[11/17/2023-11:56:18] [I] Spin-wait: Disabled
[11/17/2023-11:56:18] [I] Multithreading: Disabled
[11/17/2023-11:56:18] [I] CUDA Graph: Enabled
[11/17/2023-11:56:18] [I] Separate profiling: Disabled
[11/17/2023-11:56:18] [I] Time Deserialize: Disabled
[11/17/2023-11:56:18] [I] Time Refit: Disabled
[11/17/2023-11:56:18] [I] Skip inference: Disabled
[11/17/2023-11:56:18] [I] Inputs:
[11/17/2023-11:56:18] [I] === Reporting Options ===
[11/17/2023-11:56:18] [I] Verbose: Disabled
[11/17/2023-11:56:18] [I] Averages: 10 inferences
[11/17/2023-11:56:18] [I] Percentile: 99
[11/17/2023-11:56:18] [I] Dump refittable layers:Disabled
[11/17/2023-11:56:18] [I] Dump output: Enabled
[11/17/2023-11:56:18] [I] Profile: Enabled
[11/17/2023-11:56:18] [I] Export timing to JSON file:
[11/17/2023-11:56:18] [I] Export output to JSON file:
[11/17/2023-11:56:18] [I] Export profile to JSON file:
[11/17/2023-11:56:18] [I]
[11/17/2023-11:56:18] [I] === Device Information ===
[11/17/2023-11:56:18] [I] Selected Device: Xavier
[11/17/2023-11:56:18] [I] Compute Capability: 7.2
[11/17/2023-11:56:18] [I] SMs: 6
[11/17/2023-11:56:18] [I] Compute Clock Rate: 1.109 GHz
[11/17/2023-11:56:18] [I] Device Global Memory: 7765 MiB
[11/17/2023-11:56:18] [I] Shared Memory per SM: 96 KiB
[11/17/2023-11:56:18] [I] Memory Bus Width: 256 bits (ECC disabled)
[11/17/2023-11:56:18] [I] Memory Clock Rate: 1.109 GHz
[11/17/2023-11:56:18] [I]
[11/17/2023-11:56:18] [I] TensorRT version: 8001
[11/17/2023-11:56:18] [I] Loading supplied plugin library: liblayerplugin.so
[11/17/2023-11:56:20] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 505, GPU 5471 (MiB)
[11/17/2023-11:56:20] [I] [TRT] Loaded engine size: 133 MB
[11/17/2023-11:56:20] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 505 MiB, GPU 5471 MiB
[11/17/2023-11:56:23] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +223, GPU +287, now: CPU 743, GPU 5906 (MiB)
[11/17/2023-11:56:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +399, now: CPU 1050, GPU 6305 (MiB)
[11/17/2023-11:56:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1050, GPU 6292 (MiB)
[11/17/2023-11:56:25] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1050 MiB, GPU 6292 MiB
[11/17/2023-11:56:25] [I] Engine loaded in 6.80832 sec.
[11/17/2023-11:56:25] [W] Profiler does not work when CUDA graph is enabled. Ignored --useCudaGraph flag and disabled CUDA graph.
[11/17/2023-11:56:25] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 917 MiB, GPU 6158 MiB
[11/17/2023-11:56:25] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +4, now: CPU 917, GPU 6162 (MiB)
[11/17/2023-11:56:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 917, GPU 6172 (MiB)
[11/17/2023-11:56:25] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 920 MiB, GPU 6371 MiB
[11/17/2023-11:56:25] [I] Created input binding for input with dimensions 3x608x608
[11/17/2023-11:56:25] [I] Created output binding for detections with dimensions 159201x1x1
[11/17/2023-11:56:25] [I] Starting inference
[11/17/2023-11:56:25] [E] Error[3]: [executionContext.cpp::enqueue::276] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueue::276, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 4, but engine max batch size was: 1

AakankshaS · November 27, 2023, 7:45am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

sebastianvvj9c · November 27, 2023, 5:44pm

The model is the yolov4 model from onnx model zoo: GitHub - onnx/models: A collection of pre-trained, state-of-the-art models in the ONNX format

Running the check_model function produces no output with onnx 1.13.1

Running:
/usr/src/tensorrt/bin/trtexec --onnx=/data/models/yolov4_onnxmodelzoo.onnx --minShapes=input:1x3x416x416 --optShapes=input:16x3x416x416 --maxShapes=input:32x3x416x416 --shapes=input:5x3x416x416
gives:
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=/data/models/yolov4_onnxmodelzoo.onnx --minShapes=input:1x3x416x416 --optShapes=input:16x3x416x416 --maxShapes=input:32x3x416x416 --shapes=input:5x3x416x416

AakankshaS · January 31, 2024, 11:47am

Yes, the engine should be created on same machine, however the onnx can be imported."
i see that the model has passed in the second iteration, Can you brief about the error.

Thanks

Topic		Replies	Views
TensorRT Batch Inference: different results TensorRT	4	4205	December 1, 2021
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1402	July 12, 2022
Convert YOLOv7 QAT model to TensorRT engine failure Jetson AGX Xavier yolo	9	1055	June 21, 2023
YOLOV4- DS-TRITON \| Configuration specified max-batch 4 but TensorRT engine only supports max-batch 1 DeepStream SDK tensorrt , inference-server-triton	13	4047	October 12, 2021
TensorRT problem on NVIDIA APEX ORIN NX TensorRT tensorrt , jetson-inference , cudnn	1	36	August 29, 2024
Unable to convert tensorflow model(Bisenetv2) to tensorRT engine file TensorRT tensorrt , tensorflow , deep-learning	1	1521	April 20, 2022
Trying to convert Yolov8.onnx into trt ( TensorRT version : 8.2, jetson-jetpack : 4.6.1) Jetson Xavier NX tensorrt , cuda , yolo	12	3370	May 17, 2023
How to evaluate .engine model on custom dataset? DeepStream SDK	12	1001	May 24, 2023
C++ - Stuck with YoloV4, ONNX and TensorRT TensorRT	5	909	February 8, 2024
Jetson Nano Python 3.7 version for Tensorrt Jetson Nano tensorrt , python	14	3810	April 12, 2023

Converting yolov4 onnx model to TensorRT for multi batch input

Description

Environment

Steps To Reproduce

check_model.py

Related topics