TensorRT7 infer time is longer with dynamic shape in C++ compared with python


TensorRT Version: 7
GPU Type: 1080ti
Nvidia Driver Version:410.78
CUDA Version: 10.0
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu16.04

When using the dynamic shape for inference, the infer time is much longer using C++ interface compared with python inferface.The infer results are all right.
The engine is the same used in C++ code and python code.
The Optimization Profile is added,and the Binding Dimensions are set before inference.

    auto profile = builder->createOptimizationProfile();
    auto input = network->getInput(0);
    Dims dims = input->getDimensions();
    // set dynamic input tensors
    dims.d[0] = 1;
    profile->setDimensions(input->getName(), OptProfileSelector::kMIN, dims);
    dims.d[0] = std::max(m_config.maxBatchSize, 1);
    profile->setDimensions(input->getName(), OptProfileSelector::kOPT, dims);
    dims.d[0] = std::max(m_config.maxBatchSize+1, 1);
    profile->setDimensions(input->getName(), OptProfileSelector::kMAX, dims);
    // and optimization profile

    Dims4 inputDims{m_config.maxBatchSize, 3, m_config.inputWidth,  m_config.inputHeight };
    // Set the input size for the preprocessor
    m_pContext->setBindingDimensions(0, inputDims);
    // specify full dynamic input
    if (!m_pContext->allInputDimensionsSpecified()) {
    return -1;


How much is the infer time difference between the C++ and python interface?
Could you please share the model and script file to reproduce the issue so we can help better?


The time using C++ interface is 26 ms,and the python infer time is 8 ms.
Can you give me an email to send the mode and script file?


Can you send the files as message in the forum?
Meanwhile, you can use Nvidia profiler to debug C++ interface performance:


Thank you.I have fixed the bug.
Because the function setBindingDimensions is called every time, the every infer runs in initialized state, then the infer time is so long.