Setting the batch in TensorRT using CPP API

Description

Hi, I’m still learning how to utilise TensorRT.

  • I generated an onnx the conventional way and changed the input node’s dimension to ?x3x384x1120.
  • Then I created an Engine that supports batching using the following command:

trtexec --explicitBatch --onnx=midas_384.onnx --minShapes=INPUTS:1x3x384x1120 --optShapes=INPUTS:4x3x384x1120 --maxShapes=INPUTS:32x3x384x1120 --shapes=INPUTS:4x3x384x1120 --fp16 --verbose --workspace=2000 --saveEngine=midas_384.engine

I’m following this sample cpp code to do inferencing on my data. In the above code, I’ve concatenated 4(batch size) of my input images and added them to the hostDataBuffer.
Can you please suggest how i can set the batch size in this sample code.

Thanks!
Code Used:midas.cpp (9.0 KB)

Environment

TensorRT Version: 7.1.3
GPU Type: Jetson NX
CUDA Version: 10.2
Operating System + Version: Ubuntu 18.04 LTS

Hi,
Please refer to the below link for Sample guide.
https://docs.nvidia.com/deeplearning/tensorrt/sample-support-guide/index.html
Refer to the installation steps from the link if in case you are missing on anything
https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html
However suggested approach is to use TRT NGC containers to avoid any system dependency related issues.
https://ngc.nvidia.com/catalog/containers/nvidia:tensorrt

In order to run python sample, make sure TRT python packages are installed while using NGC container.
/opt/tensorrt/python/python_setup.sh

In case, if you are trying to run custom model, please share your model and script with us, so that we can assist you better.
Thanks!

I was able to solve the issue by passing the context to the buffer manager. Had forgotten to add that.
Thanks anyways!

1 Like

Hi @NVES
When I used my method for batching, I was able to get n outputs but they are gibberish UNLESS I set the BATCH =1. Please help me verify if I’m reading the output of the models correctly. I’m following the ONNXMNIST c++ code sample.

The modified build function:

bool SampleInference::build()
{
    std::vector<char> trtModelStream_;
    size_t size{ 0 };

    std::ifstream file("/media/31A079936F39FBF9/onnx_cache_trt/model.trt", std::ios::binary);

    if (file.good())
    {
       
        file.seekg(0, file.end);
        size = file.tellg();
        file.seekg(0, file.beg);
        trtModelStream_.resize(size);
        file.read(trtModelStream_.data(), size);
        file.close();
    }

    IRuntime* runtime = createInferRuntime(sample::gLogger);
    
    mEngine_midas_hq = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(trtModelStream_.data(), size, nullptr), samplesCommon::InferDeleter());
    
    if (!mEngine_midas_hq)
    {
        return false;
    }

    context_iExecutionContext = (mEngine_midas_hq->createExecutionContext());
    context_midas_hq = SampleUniquePtr<nvinfer1::IExecutionContext>(context_iExecutionContext);
    nvinfer1::Dims4 input_dimensions(BATCH,3,384,1120)
    context_midas_hq->setBindingDimensions(0,input_dimensions);
   
    return true;
}

The slightly modified Infer function

vector<cv::Mat> SampleInference::infer(vector<cv::Mat> &inputs_fin)
{
    samplesCommon::BufferManager buffers(mEngine_midas_hq, 0, context_iExecutionContext);
    //cudaStream_t stream;    
    //cudaStreamCreate(&stream);

    bool status_processInput = processInput(buffers,inputs_fin);
    
    //buffers.copyInputToDeviceAsync();
    buffers.copyInputToDevice();

    //bool status_inference = context_midas_hq->enqueueV2(buffers.getDeviceBindings().data(), stream, nullptr);
    bool status_inference = context_midas_hq->executeV2(buffers.getDeviceBindings().data());
    
    //buffers.copyOutputToHostAsync();
    buffers.copyOutputToHost();

    //cudaStreamSynchronize(stream);
    //cudaStreamDestroy(stream);

    vector<cv::Mat> output_fin = processOutput(buffers);
    return output_fin;
}

The modified processInput function:

bool SampleInference::processInput(const samplesCommon::BufferManager& buffers, vector<cv::Mat>& input)
{
    int batch = BATCH; // correct output only when BATCH=1
    float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer("INPUTS"));
    for (int batch_i = 0; batch_i < batch; batch_i++)
    {
        cv::Mat input1; //bgr image, hwc
        cv::resize(input[batch_i], input1, cv::Size(1120, 384), 0, 0, cv::INTER_CUBIC); 
        
        Normalizer normalizer; // bgr -> rgb, hwc -> chw, normalize
        cv::Mat refined = normalizer.Process(input1);

        cv::Mat linear_refined = refined.reshape(1,refined.total()*refined.channels());
        for (int i = (int)(linear_refined.rows)*batch_i; i < (batch_i + 1)*(int)(linear_refined.rows); i++)
        {
            hostDataBuffer[i] = (float)linear_refined.at<float>(cv::Point(i-(int)(linear_refined.rows)*batch_i,0));
        }
    }
    return true;
}

The modified verifyOutput function:

vector<cv::Mat> SampleInference::processOutput(const samplesCommon::BufferManager& buffers)
{
    int batch = BATCH; //correct output only when BATCH=1
    vector<cv::Mat> out;
    float* output = static_cast<float*>(buffers.getHostBuffer("OUTPUTS"));
    for(int batch_i=0; batch_i < batch; batch_i++)
    {
        float* output_i = output+batch_i*(384*1120);
        cv::Mat outputs = cv::Mat(384, 1120, CV_32FC1, output_i);   
        out.push_back(600000.0f*(1.0f/outputs));//Changed 6000 to 600000 because of outputs*100.0f
    }
    return out;
}

Hi,

Looks like there are some mistakes in the code. example here we may need to pass stream.

Please refer Developer Guide :: NVIDIA Deep Learning TensorRT Documentation for step by step details.

Thank you.

I’m using executev2 instead of enqueuev2 so i thought the stream was optional. Also, on using:
bool status_inference = context_midas_hq->enqueueV2(buffers.getDeviceBindings().data(), stream, nullptr);
(basically using the commented lines in SampleInference::infer) I’m facing the same issue.

Could you please give us more details on changes, have you done this for dynamic batch size ?

I generated the onnx using the torch.onnx.export command. The input to the onnx at this stage was 1x3x384x1120. Then I used the following code to change the first layer of the onnx to ?x3x384x1120.

   import onnx
     
   model = onnx.load('model.onnx')
   model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'
   onnx.save(model, 'model.onnx')
   onnx.checker.check_model(model)

After that I converted it to a TensorRT engine using the trtexec command:

trtexec --explicitBatch --onnx=model.onnx --minShapes=INPUTS:1x3x384x1120 --optShapes=INPUTS:4x3x384x1120 --maxShapes=INPUTS:32x3x384x1120 --shapes=INPUTS:4x3x384x1120 --fp16 --verbose --workspace=2000 --saveEngine=model.trt

Then I used the code I sent in the previous reply. I set the BATCH to 4.
The code works when when BATCH is 1 with/without changing the input layer’s dimensions but gives wrong output when BATCH is 4.

Then you may need to use dynamic input optimization profile for inference.
Please refer following.
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work_dynamic_shapes
https://docs.nvidia.com/deeplearning/tensorrt/sample-support-guide/index.html#sample-dynamic-reshape

Thank you.

1 Like