Convert YOLOv7 QAT model to TensorRT engine failure

Description

When I refer to yolo_deepstream/tree/main/tensorrt_yolov7 and use “yolov7QAT” to perform a batch detection task, the following error occurs
./build/detect --engine=yolov7QAT.engine --img=./imgs/horses.jpg,./imgs/zidane.jpg

Error Message

input 2 images, paths: ./imgs/horses.jpg, ./imgs/zidane.jpg, 
--------------------------------------------------------
Yolov7 initialized from: /opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/yolov7/yolov7QAT.engine
input : images , shape : [ 1,3,640,640,]
output : outputs , shape : [ 1,25200,85,]
--------------------------------------------------------
preprocess start
error cv_img.size() in preProcess
 error: mImgPushed = 1 numImg = 1 mMaxBatchSize= 1, mImgPushed + numImg > mMaxBatchSize 
inference start
postprocessing start
detectec image written to: ./imgs/horses.jpgdetect0.jpg

Note

  • It works fine when running a single detection task with “yolov7QAT.engine”.
  • “yolov7QAT.engine” comes from yolov7_qat_640.onnx conversion(NVIDIA-AI-IOT/yolo_deepstream/tensorrt_yolov7)
    /usr/src/tensorrt/bin/trtexec --onnx=yolov7_qat_640.onnx --saveEngine=yolov7QAT.engine --fp16 --int8
  • Whether “yolov7_qat_640.onnx” downloaded from “NVIDIA-AI-IOT/yolo_deepstream/yolov7_qat” or self trained (it shows the same structure with netron), the same error occurs when running . /build/detect all show the same error message
  • Runs fine with non-qat “yolov7db4fp32.engine” or “yolov7db4fp16.engine”

Environment

TensorRT Version: 5.1
GPU Type: J etson AGX Xavier
Nvidia Driver Version:
CUDA Version: 11.4.315
CUDNN Version: 8.6.0.166
Operating System + Version: 35.2.1 ( Jetpack: 5.1)
Python Version (if applicable): Python 3.8.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.12.0a0+2c916ef.nv22.3
Baremetal or Container (if container which image + tag):

Steps To Reproduce

Follow yolo_deepstream/tensorrt_yolov7 at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub

So low version of TensorRT

1 Like

I may have accidentally downgraded TensorRT during my own installation

The original default in Jetpack 5.1 should have been 8.5.2.2

Thanks for the reminder, I’ll try again after I upgrade.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

1 Like

When I run the dpkg -l |grep -i tensor command, I get the following message, my tensorrt should be 8.5.2.2 no problem

ii  graphsurgeon-tf                            8.5.2-1+cuda11.4                     arm64        GraphSurgeon for TensorRT package
ii  libnvinfer-bin                             8.5.2-1+cuda11.4                     arm64        TensorRT binaries
ii  libnvinfer-dev                             8.5.2-1+cuda11.4                     arm64        TensorRT development libraries and headers
ii  libnvinfer-plugin-dev                      8.5.2-1+cuda11.4                     arm64        TensorRT plugin libraries
ii  libnvinfer-plugin8                         8.5.2-1+cuda11.4                     arm64        TensorRT plugin libraries
ii  libnvinfer-samples                         8.5.2-1+cuda11.4                     all          TensorRT samples
ii  libnvinfer8                                8.5.2-1+cuda11.4                     arm64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                       8.5.2-1+cuda11.4                     arm64        TensorRT ONNX libraries
ii  libnvonnxparsers8                          8.5.2-1+cuda11.4                     arm64        TensorRT ONNX libraries
ii  libnvparsers-dev                           8.5.2-1+cuda11.4                     arm64        TensorRT parsers libraries
ii  libnvparsers8                              8.5.2-1+cuda11.4                     arm64        TensorRT parsers libraries
ii  nvidia-tensorrt                            5.1-b147                             arm64        NVIDIA TensorRT Meta Package
ii  nvidia-tensorrt-dev                        5.1-b147                             arm64        NVIDIA TensorRT dev Meta Package
ii  python3-libnvinfer                         8.5.2-1+cuda11.4                     arm64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                     8.5.2-1+cuda11.4                     arm64        Python 3 development package for TensorRT
ii  tensorrt                                   8.5.2.2-1+cuda11.4                   arm64        Meta package for TensorRT
ii  tensorrt-libs                              8.5.2.2-1+cuda11.4                   arm64        Meta package for TensorRT runtime libraries
ii  uff-converter-tf                           8.5.2-1+cuda11.4                     arm64        UFF converter for TensorRT package

But when I use the jtop command, I get the message “TensorRT: 5.1”.
Which version do I have?

Hi,

We are able to successfully build the TensorRT engine on version 8.6.
Please make sure you’re using the latest TensorRT version. For better help, we are moving this post to the Jetson AGX Xavier forum.

Thank you.

1 Like

Hi,

Which batch size do you use?
Based on their doc, it only supports batchsize=1 use case.

Thanks.

1 Like

Hi, AakankshaS

No matter I use

  1. check_model.py
    test with "yolov7_qat_640.onnx "

or

  1. trtexec command.
    usr/src/tensorrt/bin/trtexec --loadEngine=yolov7QAT.engine --verbose
    /usr/src/tensorrt/bin/trtexec --loadEngine=yolov7QAT.engine --verbose --batch=12

there are no error messages

@ AastaLLL
I successfully loaded the model with trtexec and explicitly specified batch=16, and it executed successfully
/usr/src/tensorrt/bin/trtexec --loadEngine=yolov7QAT.engine --verbose --batch=12

However, when I change the parameter “batch-size” in the Deepstream configuration file “pgie_yolov7_config.txt”
batch-size=1, execution succeeds
batch-size=2, failed

[property]
model-engine-file=... /... /... /... /samples/models/tao_pretrained_models/yolov7/yolov7QAT.engine
force-implicit-batch-dim=1
batch-size=1

Therefore, I suspect that the “.onnx > .engine” step is not setting the “dynamic-batch” correctly causing the above problem.

I ran the following command to replace model.onnx with model.engine, explicitly specifying “dynamic-batch”, and the problem was solved!

/usr/src/tensorrt/bin/trtexec --onnx=yolov7_qat_640.onnx \
--minShapes=images:1x3x640x640 \
--optShapes=images:12x3x640x640 \
--maxShapes=images:16x3x640x640 \
--saveEngine=yolov7_qat_640.engine --fp16 --int8

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.