Performance discrepancy using TensorRT engines

romilaggarwal611 · September 27, 2021, 5:11am

Description

Hi, I’m building an SDK in which I use multiple engines. When each model is tested alone, the inference time taken by each model is close to the mean time I see using trtexec --loadEngine=<model.engine> --iterations=100. But, when run in the SDK, all the models give a worse performance(sometimes even by 40%!!).
In the SDK, I’m doing the ‘init’ for all the models together(basically loading the engine and creating the context). After that I call the inference for the engine I require. I have 4 models loaded.

Am I doing something wrong or is this the expected behaviour? Is there a better way to do it?

Environment

TensorRT Version: 7.1.3
GPU Type: Jetson NX
CUDA Version: 10.2
Operating System + Version: Ubuntu 18.04 LTS
Python Version (if applicable): 3.6.9

NVES · September 27, 2021, 5:37am

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#model-accuracy

Thanks!

romilaggarwal611 · October 4, 2021, 4:41am

The way I implement the inference code is very similar to the ONNXMNIST sample. Only the build function is modified to have the context as well. The build function for all the models are called in the INIT() of the sdk and the infer function is called when required.
If there is a better method to use the engines for different models simultaneously then please do tell.

bool SampleInference::build()
{
    std::vector<char> trtModelStream_;
    size_t size{ 0 };

    std::ifstream file("/media/31A079936F39FBF9/romil/onnx_cache_trt/midas_384_new_folded_questionmark.trt", std::ios::binary);

    if (file.good())
    {
       
        file.seekg(0, file.end);
        size = file.tellg();
        file.seekg(0, file.beg);
        trtModelStream_.resize(size);
        file.read(trtModelStream_.data(), size);
        file.close();
    }

    IRuntime* runtime = createInferRuntime(sample::gLogger);
    
    mEngine_hq = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(trtModelStream_.data(), size, nullptr), samplesCommon::InferDeleter());
    
    if (!mEngine_hq)
    {
        return false;
    }

    context_iExecutionContext = (mEngine_hq->createExecutionContext());
    context_hq = SampleUniquePtr<nvinfer1::IExecutionContext>(context_iExecutionContext);
    nvinfer1::Dims4 input_dimensions(BATCH,3,384,1120);
    //int binding_index = nvinfer1::iCudaEngine::getBindingIndex("INPUTS"); 
    context_hq->setBindingDimensions(0,input_dimensions);
   
    return true;
}

spolisetty · October 5, 2021, 2:55pm

Hi,

We recommend you to please try latest TensorRT version. If you still face the performance issue, please share us issue repro onnx model and script/steps to try from our end for better help.

Thank you.

Topic		Replies	Views
ONNX Model Int64 Weights TensorRT	12	13438	February 17, 2024
Different engines give different inference results when using the same onnx model and giving the same input TensorRT	4	958	December 31, 2023
TensorRT engines are built so differently with the same IBuilderConfig, how to fix? TensorRT	1	622	September 20, 2021
Outputs of tensorrt are too different according to the compute capabilities TensorRT	1	430	November 2, 2022
TensorRT inference slower than PyTorch, different tactics are being selected TensorRT tensorrt	1	1384	November 27, 2023
TensorRT Batching Speed scales poorly TensorRT tensorrt , cuda	6	1726	September 30, 2021
TensorRT inference take too much time than expected TensorRT tensorrt	2	1032	December 22, 2020
Inference time increases rapidly when set a high resolution input image TensorRT tensorrt , cuda , ubuntu	1	807	September 13, 2023
TensorRT Inference Consuming Large Amount of System Resources TensorRT	1	617	July 5, 2022
Inference time changes after training TensorRT tensorrt	5	578	September 25, 2020

Performance discrepancy using TensorRT engines

Description

Environment

Related topics