Setting the batch in TensorRT using CPP API


Hi, I’m still learning how to utilise TensorRT.

  • I generated an onnx the conventional way and changed the input node’s dimension to ?x3x384x1120.
  • Then I created an Engine that supports batching using the following command:

trtexec --explicitBatch --onnx=midas_384.onnx --minShapes=INPUTS:1x3x384x1120 --optShapes=INPUTS:4x3x384x1120 --maxShapes=INPUTS:32x3x384x1120 --shapes=INPUTS:4x3x384x1120 --fp16 --verbose --workspace=2000 --saveEngine=midas_384.engine

I’m following this sample cpp code to do inferencing on my data. In the above code, I’ve concatenated 4(batch size) of my input images and added them to the hostDataBuffer.
Can you please suggest how i can set the batch size in this sample code.

TensorRT Version: 7.1.3
GPU Type: Jetson NX
CUDA Version: 10.2
Operating System + Version: Ubuntu 18.04 LTS

I was able to solve the issue by passing the context to the buffer manager. Had forgotten to add that.
Thanks anyways!

When I used my method for batching, I was able to get n outputs but they are gibberish UNLESS I set the BATCH =1. Please help me verify if I’m reading the output of the models correctly. I’m following the ONNXMNIST c++ code sample.

The modified build function:

bool SampleInference::build()
    std::vector<char> trtModelStream_;
    size_t size{ 0 };

    std::ifstream file("/media/31A079936F39FBF9/onnx_cache_trt/model.trt", std::ios::binary);

    if (file.good())
        file.seekg(0, file.end);
        size = file.tellg();
        file.seekg(0, file.beg);
        trtModelStream_.resize(size);, size);

    IRuntime* runtime = createInferRuntime(sample::gLogger);
    mEngine_midas_hq = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(, size, nullptr), samplesCommon::InferDeleter());
    if (!mEngine_midas_hq)
        return false;

    context_iExecutionContext = (mEngine_midas_hq->createExecutionContext());
    context_midas_hq = SampleUniquePtr<nvinfer1::IExecutionContext>(context_iExecutionContext);
    nvinfer1::Dims4 input_dimensions(BATCH,3,384,1120)
    return true;

The slightly modified Infer function

vector<cv::Mat> SampleInference::infer(vector<cv::Mat> &inputs_fin)
    samplesCommon::BufferManager buffers(mEngine_midas_hq, 0, context_iExecutionContext);
    //cudaStream_t stream;    

    bool status_processInput = processInput(buffers,inputs_fin);

    //bool status_inference = context_midas_hq->enqueueV2(buffers.getDeviceBindings().data(), stream, nullptr);
    bool status_inference = context_midas_hq->executeV2(buffers.getDeviceBindings().data());


    vector<cv::Mat> output_fin = processOutput(buffers);
    return output_fin;

The modified processInput function:

bool SampleInference::processInput(const samplesCommon::BufferManager& buffers, vector<cv::Mat>& input)
    int batch = BATCH; // correct output only when BATCH=1
    float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer("INPUTS"));
    for (int batch_i = 0; batch_i < batch; batch_i++)
        cv::Mat input1; //bgr image, hwc
        cv::resize(input[batch_i], input1, cv::Size(1120, 384), 0, 0, cv::INTER_CUBIC); 
        Normalizer normalizer; // bgr -> rgb, hwc -> chw, normalize
        cv::Mat refined = normalizer.Process(input1);

        cv::Mat linear_refined = refined.reshape(1,*refined.channels());
        for (int i = (int)(linear_refined.rows)*batch_i; i < (batch_i + 1)*(int)(linear_refined.rows); i++)
            hostDataBuffer[i] = (float)<float>(cv::Point(i-(int)(linear_refined.rows)*batch_i,0));
    return true;

The modified verifyOutput function:

vector<cv::Mat> SampleInference::processOutput(const samplesCommon::BufferManager& buffers)
    int batch = BATCH; //correct output only when BATCH=1
    vector<cv::Mat> out;
    float* output = static_cast<float*>(buffers.getHostBuffer("OUTPUTS"));
    for(int batch_i=0; batch_i < batch; batch_i++)
        float* output_i = output+batch_i*(384*1120);
        cv::Mat outputs = cv::Mat(384, 1120, CV_32FC1, output_i);   
        out.push_back(600000.0f*(1.0f/outputs));//Changed 6000 to 600000 because of outputs*100.0f
    return out;


I’m using executev2 instead of enqueuev2 so i thought the stream was optional. Also, on using:
bool status_inference = context_midas_hq->enqueueV2(buffers.getDeviceBindings().data(), stream, nullptr);
(basically using the commented lines in SampleInference::infer) I’m facing the same issue.

Could you please give us more details on changes, have you done this for dynamic batch size ?

I generated the onnx using the torch.onnx.export command. The input to the onnx at this stage was 1x3x384x1120. Then I used the following code to change the first layer of the onnx to ?x3x384x1120.

   import onnx
   model = onnx.load('model.onnx')
   model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?', 'model.onnx')

After that I converted it to a TensorRT engine using the trtexec command:

trtexec --explicitBatch --onnx=model.onnx --minShapes=INPUTS:1x3x384x1120 --optShapes=INPUTS:4x3x384x1120 --maxShapes=INPUTS:32x3x384x1120 --shapes=INPUTS:4x3x384x1120 --fp16 --verbose --workspace=2000 --saveEngine=model.trt

Then I used the code I sent in the previous reply. I set the BATCH to 4.
The code works when when BATCH is 1 with/without changing the input layer’s dimensions but gives wrong output when BATCH is 4.

