How to parallel inference model in nvidia Xavier nx platform


I want to parallel inference model in the nvidia xavier nx platform. I try to create multiply engine with different DLA core and create multiply contexts to inference model.

  • would you give me same guide about parallel inferenceing model in this device?
  • Would you tell me if multiply context object can be created in the same one engine object in tensorrt ?


TensorRT Version : 7.1.3
GPU Type :
Nvidia Driver Version :
CUDA Version : 10.2
CUDNN Version :
Operating System + Version :
Python Version (if applicable) :
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :

Relevant Files

  • create multiply engine
bool TensorrtExecutor::constructNetwork(UniquePtr<nvinfer1::IBuilder>& builder,
        UniquePtr<nvinfer1::INetworkDefinition>& network, 
        UniquePtr<nvinfer1::IBuilderConfig>& config,
        UniquePtr<nvonnxparser::IParser>& parser)
    auto parsed = parser->parseFromFile(Iparams.ParamsOnnx.onnxFileName.c_str(), static_cast<int>(bench::gLogger.getReportableSeverity()));

    if (!parsed)
        std::cerr << "parse model error" << std::endl;
        return false;
    if (Iparams.ParamsOnnx.fp16)
    if (Iparams.ParamsOnnx.int8)
        bench::setAllTensorScales(network.get(), 127.0f, 127.0f);

    int count = 0;
    for (int i = builder->getNbDLACores(); i >0  && count < this->num_threads; i--)
        std::cout << "Using DLA core " << (i-1) << std::endl;
        if(i-1 < 0)
            bench::enableDLA(builder.get(), config.get(), i-1);
            Iengine = std::shared_ptr<nvinfer1::ICudaEngine>( builder->buildEngineWithConfig(*network, *config), bench::InferDeleter());

                std::cout << "The engine create failed." << std::endl;
                // return false;
            count ++;
    return true;

  • create multiply context
 for(int i = 0; i < this->num_threads; i ++)
        std::cout << "the size of Iengines is "<< Iengines.size()<< std::endl;
        int k = i % Iengines.size();
        std::cout << "the no." << k << " Engine." << std::endl;
        auto context = UniquePtr<nvinfer1::IExecutionContext>(Iengines[k]->createExecutionContext());
        // auto context = UniquePtr<nvinfer1::IExecutionContext>(Iengine->createExecutionContext());
        if (!context)
            std::cout << "can not create  context" << std::endl;

  • then I create multiply thread to inference same model with different pictures to parallel inference result.

Thanks for your time.

Hi, all

  • The engine parallel is alloed, When other contexts are created in the engine that is associated with one context, I got the error.
  • Does the above conclusion hold


You can launch trtexec in the different consoles for parallel inference.

$ /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --useDLACore=0 --allowGPUFallback
$ /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --useDLACore=1 --allowGPUFallback

If you are finding a multi-thread inference example, please check the below comment:

We create two engines for the same model with the corresponding context to make it parallel.



What work has been done on the inference load and adjustment of real-time monitoring equipment?


You will need to separate the engine and context for parallel usage.
With the same model, this can also be done via creating the engine from the same model file.


Hi, following the documentation for running tensorrt inference with multi-streams, is that the same case? I’m confused if we can create a single engine just once, then use it with multi-streams (as the documentation) or create multiple engines from the same file ?

Hi dangnh0611,

Please help to open a new topic. Thanks