I am new to TensorRT and I am trying to implement an inference server using TensorRT. Everything works good if I just run the engine in one thread. However, I have encountered some problem when I try to run the engine in multiple threads. For example, the following is sample code from tensorrt/samples/python/end_to_end_tensorflow_mnist/sample.py. I just modified it make it run in multi-thread manner:
def infer(model_file, data_path): with build_engine(model_file) as engine: # Build an engine, allocate buffers and create a stream. # For more information on buffer allocation, refer to the introductory samples. inputs, outputs, bindings, stream = common.allocate_buffers(engine) with engine.create_execution_context() as context: case_num = load_normalized_test_case(data_path, pagelocked_buffer=inputs.host) # For more information on performing inference, refer to the introductory samples. # The common.do_inference function will return a list of outputs - we only have one in this case. [output] = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream) pred = np.argmax(output) print("Test Case: " + str(case_num)) print("Prediction: " + str(pred)) def main(): data_path = common.find_sample_data(description="Runs an MNIST network using a UFF model file", subfolder="mnist") model_file = ModelData.MODEL_FILE # This works fine infer(model_file, data_path) # Error t = threading.Thread(target=infer, args=(model_file, data_path)) t.start() t.join()
When I try to run it in different threads, it gives me the following error:
pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?
It seems like the CUDA context in the new thread is not initialized. So I comment out “import pycuda.autoinit” and try to initialize CUDA context. I add following code at the beginning and end of the ‘infer()’ function
cuda.init() device = cuda.Device(0) ctx = device.make_context() # infer body ... ctx.pop()
This works fine for the MNIST example. However, when I try another engine with CNN, I got the following error:
[TensorRT] ERROR: cuda/cudaConvolutionLayer.cpp (163) - Cudnn Error in execute: 7 [TensorRT] ERROR: cuda/cudaConvolutionLayer.cpp (163) - Cudnn Error in execute: 7
And again, this engine works good if I only run it in single thread.
Now I have no idea how to solve these problems. Anyone has any suggestions how to implement TensorRT as a multi-thread inference server? Any suggestion will be appreciated.