Load ONNX model with batch size

I am trying to load the model attached. I can’t figure out how to correctly set up the batch size of the model. It looks like the input is configured to have batch size = 8 (shape [8, 3, 640, 640], but the output has batch size = 1 (shapes [1, 3, 80, 80, 85], [1, 3, 40, 40, 85], [1, 3, 20, 20, 85]. Why?

Python code

        trt_settings = dict(
            inputs= [
              {
                "name": None,
                "shape": [8, 3, 640, 640]
              }
            ],
            outputs=[
                {
                    "name": "classes",
                    "shape": [8, 3, 80, 80, 85]
                },
                {
                    "name": "boxes",
                    "shape": [8, 3, 40, 40, 85]
                },
                {
                    "name": "444",
                    "shape": [8, 3, 20, 20, 85]
                },
            ],
            # model_path=self.onnx_path
            model_path="yolov5s.onnx"
        )

        # Required for ONNX engines
        EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

        with trt.Builder(self.trt_logger) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(
            network, self.trt_logger
        ) as parser:
            try:
                builder.max_workspace_size = 1 << 30  # self.max_workspace_size
                builder.max_batch_size = 8

                builder.fp16_mode = True

                trt_required_inputs = trt_settings.get("inputs", list())
                trt_required_outputs = trt_settings.get("outputs", list())

                with open(trt_settings.get('model_path'), "rb") as model:
                    logger.info("Beginning ONNX file parsing...")
                    if not parser.parse(model.read()):
                        logger.error("ERROR: Failed to parse the ONNX file.")
                        for error in range(parser.num_errors):
                            logger.error(parser.get_error(error))
                        raise RuntimeError

                for index, model_input in enumerate(trt_required_inputs):
                    network.get_input(index).shape = model_input["shape"]

                logger.info("Building TensorRT engine. This may take few minutes.")
                return builder.build_cuda_engine(network)

            except Exception:
                logger.error("Error creating CUDA Engine...")
                logger.error(traceback.format_exc())
                raise RuntimeError


Note: I know that Yolo v5 provides a TensorRT implementation written in C++. However, without going into too many details, I need to convert the model from its ONNX version (basically I need to use Yolo v5m). If you are not familiar with Yolo v5, you can simply forget about this comment.
yolov5s.onnx.zip (23.7 MB)

Fixed by explicitly settings the batch size in the ONNX model.

Hi, Request you to share the ONNX model and the script so that we can assist you better.

Alongside you can try validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).

Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!