Thread safe while use tensorRT

tcwdwjj · February 15, 2019, 5:46am

I use tensorRT4 to do inference.I load the engine to the memory,then mutil thread will do inference via the fuction like this.

int TensorRTEngine::doInference(float *input, float *output,int batchSize) {
    int64_t start=getCurrentTime();
    IExecutionContext *context=engine->createExecutionContext();
    int64_t end=getCurrentTime();
    EngineError("create context cost %d",(end-start));
//    EngineError("maxBatchSize:%d",engine->getMaxBatchSize());
    // input and output buffer pointers that we pass to the engine - the engine requires exactly IEngine::getNbBindings(),
    // of these, but in this case we know that there is exactly one input and one output.
    assert(engine->getNbBindings() == 2);
    void* buffers[2];

    // In order to bind the buffers, we need to know the names of the input and output tensors.
    // note that indices are guaranteed to be less than IEngine::getNbBindings()
    int inputIndex = engine->getBindingIndex(inputBlobName.c_str()),
            outputIndex = engine->getBindingIndex(outPutBlobName.c_str());
    size_t inputSize = batchSize * getInputSize() * sizeof(float);
    size_t outputSize = batchSize * getOutPutSize() * sizeof(float);
    // create GPU buffers and a stream
    CHECK(cudaMalloc(&buffers[inputIndex], inputSize));
    CHECK(cudaMalloc(&buffers[outputIndex], outputSize));
    //context.setProfiler(&gProfiler);
    cudaStream_t stream;
    CHECK(cudaStreamCreate(&stream));
    // DMA the input to the GPU,  execute the batch asynchronously, and DMA it back:
    CHECK(cudaMemcpyAsync(buffers[inputIndex], input, inputSize, cudaMemcpyHostToDevice, stream));
    context->enqueue(batchSize, buffers, stream, nullptr);
    CHECK(cudaMemcpyAsync(output, buffers[outputIndex], outputSize, cudaMemcpyDeviceToHost, stream));
    cudaStreamSynchronize(stream);

    // release the stream and the buffers
    cudaStreamDestroy(stream);
    CHECK(cudaFree(buffers[inputIndex]));
    CHECK(cudaFree(buffers[outputIndex]));
    if(context){
        context->destroy();
    }
    return 0;
}

every fuction call will create a new IExecutionContext from the engine,but i find when a lot of thread call this function ，sometimes the result is not corect，dose the tensorRT thread safety?

NVES_R · March 25, 2019, 8:22pm

Hi,

Please see the docs on thread-safety here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#thread-safety

2.3. Thread Safety
The TensorRT builder may only be used by one thread at a time. If you need to run multiple builds simultaneously, you will need to create multiple builders.

The TensorRT runtime can be used by multiple threads simultaneously, so long as each object uses a different execution context.

Note: Plugins are shared at the engine level, not the execution context level, and thus plugins which may be used simultaneously by multiple threads need to manage their resources in a thread-safe manner.

The TensorRT library pointer to the logger is a singleton within the library. If using multiple builder or runtime objects, use the same logger, and ensure that it is thread-safe.

Thanks,
NVIDIA Enterprise Support

Topic		Replies	Views
Is TensorRT safe to create engine & context in one thread, and execute in another thread? TensorRT	1	701	June 5, 2022
how to run trt in multithreading？ Jetson TX2	15	7982	October 18, 2021
Context thread-safe TensorRT cudnn	1	36	March 25, 2025
TensorRT Builder timing cache - preventing inaccurate timings due to concurrent GPU use TensorRT tensorrt	3	1132	October 16, 2021
Can multiple CUDA contexts share an inference engine? TensorRT tensorrt , cuda	3	60	January 21, 2025
Concurrent inference in a single IExecutionContext TensorRT	2	981	February 11, 2020
Is multi threaded execution possible with tensorRT? TensorRT	3	2249	April 13, 2020
Can TensorRT do inference in a child thread ? TensorRT	6	2209	August 11, 2020
How to use TensorRT by the multi-threading package of python Jetson AGX Xavier tensorrt	13	18775	October 18, 2021
[TensorRT] engine happed a error in multithreaded TensorRT tensorrt , cuda	2	1565	January 19, 2023

Thread safe while use tensorRT

Related topics