How to perform batch inference with explicit batch?

Description

I can’t seem to find a clear example on how to perform batch inference using the explicit batch mode.
I see many outdated articles pointing to this example here, but looking at the code, it only uses a batch size of 1. Other examples I see use implicit batch mode, but this is now deprecated so I need an example demonstrating how to use explicit batch mode.

How can I use a batch size larger than 1?

I follow the sampleOnnxMNIST.cpp sample code to create the following:

You can assume that m_inputDims and m_outputDims are of type nvinfer1::Dims and already contain relevant information.

bool InferenceEngine::infer() {
//     Read the serialized model file
    std::ifstream file(m_enginePath, std::ios::binary | std::ios::ate);
    std::streamsize size = file.tellg();
    file.seekg(0, std::ios::beg);

    std::vector<char> buffer(size);
    if (!file.read(buffer.data(), size)) {
        throw std::runtime_error("Unable to read engine file");
    }

    std::unique_ptr<IRuntime> runtime{createInferRuntime(m_logger)};
    if (!runtime) {
        return false;
    }

    m_engine = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(buffer.data(), buffer.size()));
    if (!m_engine) {
        return false;
    }

    // Create RAII buffer manager object
    samplesCommon::BufferManager buffers(m_engine); // TODO can specify the batch size in this call.

    auto context = std::unique_ptr<nvinfer1::IExecutionContext>(m_engine->createExecutionContext());
    if (!context) {
        return false;
    }

    size_t batchSize = 1;

    if (!processInput(buffers, batchSize)) {
        return false;
    }

    // Memcpy from host input buffers to device input buffers
    buffers.copyInputToDevice();

    bool status = context->executeV2(buffers.getDeviceBindings().data());
    if (!status) {
        return false;
    }

    // Memcpy from device output buffers to host output buffers
    buffers.copyOutputToHost();

    const int outputSize = m_outputDims.d[1];
    float* output = static_cast<float*>(buffers.getHostBuffer("2621"));

    for (int i = 0; i < outputSize; ++i) {
        std::cout << output[i] << " ";
    }

    std::cout << "\n\n\n" << std::endl;
    return true;
}

And here is the definition for the processInput function:

bool InferenceEngine::processInput(const samplesCommon::BufferManager &buffers, size_t batchSize) {
    auto image = cv::imread("../img.jpg");
    if (image.empty()) {
        throw std::runtime_error("Could not load image");
    }

    cv::cvtColor(image, image, cv::COLOR_BGR2RGB);

    const int inputH = m_inputDims.d[2];
    const int inputW = m_inputDims.d[3];

    // Preprocess code
    image.convertTo(image, CV_32FC3, 1.f / 255.f);
    cv::subtract(image, cv::Scalar(0.5f, 0.5f, 0.5f), image, cv::noArray(), -1);
    cv::divide(image, cv::Scalar(0.5f, 0.5f, 0.5f), image, 1, -1);

    float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer("input.1"));

    int r = 0 , g = 0, b = 0;
    for (int i = 0; i < 112 * 112 * 3; ++i) {
        if (i % 3 == 0) {
            hostDataBuffer[r++] = *(reinterpret_cast<float*>(image.data) + i);
        } else if (i % 3 == 1) {
            hostDataBuffer[g++ + 112*112] = *(reinterpret_cast<float*>(image.data) + i);
        } else {
            hostDataBuffer[b++ + 112*112*2] = *(reinterpret_cast<float*>(image.data) + i);
        }
    }

    for (int i = 0; i < 30; ++i) {
        std::cout << hostDataBuffer[i] << " ";
    }
    std::cout << "\n\n";

    return true;
}

For a batch size of 1, this works great. However, how would I adapt the above code to work for a batch size greater than 1? This call here float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer("input.1")); is only allocating enough memory for a single batch, so how do I ensure enough memory has been allocated for the number of batches I plan on running?

Additionally, something I am confused about, how does the call to bool status = context->executeV2(buffers.getDeviceBindings().data()); know the batch size, since we provide no argument which states how large the buffer being passed to the function call is.

Environment

TensorRT Version: 8.0.3.4
GPU Type: RTX 3080
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
CUDNN Version:
Operating System + Version: Ubuntu 20.04

Hi,

Please refer to below link for working with dynamic shapes:
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes

You can fine tune model using optimization profiles to specific input dim range
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#opt_profiles

Following example may help you.

Thank you.