TensorRT Batch Inference: different results



I am utilizing YOLOV4 detection models for my project. I use AlexeyAB’s darknet fork for training custom YOLOv4 detection models. For TensorRT conversion, I use Tianxiaomo’s pytorch-YOLOv4 to parse darknet models to Pytorch and then later to ONNX using torch.onnx.export.

The issue is that when I use the TensorRT model for batch size 1 inference, there is no problem but for batch size > 1, depending on the TensorRT conversion method, the inference results from the model is different.

Method#1 : Conversion using trtexec

trtexec --onnx=onnx_models/yolov4tiny_2_3_416_416_static.onnx --maxBatch=2 --fp16  --saveEngine=trt_models/trtexec/yolov4tiny2-416.trt

For testing, I created NumPy batch image (batch size 2) from a single image. However, the inference results are not similar. There are 5 objects are detected in the former but 11 objects are detected in the latter. The detection outputs for the first frame and the second frame is not the same despite being the same image.

Method#2 : Conversion using TensorRT Python API

if trt.__version__[0] >= '7':
        1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))

def build_engine(onnx_file_path, engine_file_path, verbose=False, batch_size=1):
    """Takes an ONNX file and creates a TensorRT engine."""
    TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) if verbose else trt.Logger()
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(*EXPLICIT_BATCH) as network, trt.OnnxParser(network,
                                                                                                                TRT_LOGGER) as parser:

        if trt.__version__[0] >= '8':
            config = builder.create_builder_config()
            config.max_workspace_size = 1 << 28
            builder.max_batch_size = batch_size
            config.flags = 1 << int(trt.BuilderFlag.FP16)
            # config.flags = strict_type_constraints << int(trt.BuilderFlag.STRICT_TYPES)
            builder.max_workspace_size = 1 << 28
            builder.max_batch_size = batch_size
            builder.fp16_mode = True
            # builder.strict_type_constraints = True

        # Parse model file
        print('Loading ONNX file from path {}...'.format(onnx_file_path))
        with open(onnx_file_path, 'rb') as model:
            print('Beginning ONNX file parsing')
            if not parser.parse(model.read()):
                print('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                return None
        if trt.__version__[0] >= '7':
            # The actual yolo*.onnx is generated with batch size 64.
            # Reshape input to batch size 1
            shape = list(network.get_input(0).shape)
            shape[0] = batch_size
            network.get_input(0).shape = shape
        print('Completed parsing of ONNX file')

        print('Building an engine; this may take a while...')
        if trt.__version__[0] >= '8':
            engine = builder.build_engine(network, config)
            engine = builder.build_cuda_engine(network)
        print('Completed creating engine')
            with open(engine_file_path, 'wb') as f:
            return engine

TensorRT model, converted from python API produces different results from trtexec. Python API TensorRT model produces 11 detections for the first image in the batch image (batch size is 2) but empty detection results for the latter similar to this issue by @thomallain. The detection outputs here only concern the first frame. All the detector arrays for the second frame are equal to zeros.


TensorRT Version:
GPU Type: GTX 1070
Nvidia Driver Version: 465.31
CUDA Version:
CUDNN Version: 8.2.2
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3,6
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.5
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

File Structure
├── onnx_models
│ ├── yolov4_-1_3_416_416_dynamic.onnx
│ ├── yolov4_1_3_416_416_static.onnx
│ └── yolov4_2_3_416_416_static.onnx
├── TownCenter-008.jpg
├── trt_convert.py
├── trt_models
│ ├── pythonapi
│ │ └── yolov4tiny2-416.trt
│ └── trtexec
│ └── yolov4tiny2-416.trt
└── yolov4-tiny.py

Relevant scripts, onnx models, and converted TensorRT models can be download via this Google Drive: issue - Google Drive

Steps To Reproduce

  • Conversion via trtexec can be done with the aforementioned method.
  • Conversion with python api can be done with trt_convert.py by passing desired onnx model as parameter.
  • Inference can be done with yolov4-tiny.py by passing desired converted TensorRT model as parameter.
python yolov4-tiny.py --trt trt_models/trtexec/yolov4tiny2-416.trt --conf_threshold 0.3 --nms_threshold 0.4 --num_classes 1 --batch_size 2

Could you share with me some suggestions on how to fix this error so that batch inference runs as expected?

Thanks in advance!

Best Regards,

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet


import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging

@NVES , Hi, Thanks for the reply.

I have already added a link to download all required files to reproduce this error via Google Drive. I’ve also attached the text file which contains verbose outputs while converting the model with trtexec method.

check_model.py also runs fine with the onnx that I’ve attached in the GoogleDrive link.
verbose.txt (1.3 MB)


Sorry for the delayed response. Are you using dynamic shape input ? as you are working with batches, Please make sure you’re marking dynamic shapes correctly. Looks like you’re also not mentioning input layer name and dynamic shapes. Please refer following,