Hi, all
I am new to TensorRT and I am trying to implement an inference server using TensorRT. Everything works good if I just run the engine in one thread. However, I have encountered some problem when I try to run the engine in multiple threads. For example, the following is sample code from tensorrt/samples/python/end_to_end_tensorflow_mnist/sample.py. I just modified it make it run in multi-thread manner:
def infer(model_file, data_path):
with build_engine(model_file) as engine:
# Build an engine, allocate buffers and create a stream.
# For more information on buffer allocation, refer to the introductory samples.
inputs, outputs, bindings, stream = common.allocate_buffers(engine)
with engine.create_execution_context() as context:
case_num = load_normalized_test_case(data_path, pagelocked_buffer=inputs[0].host)
# For more information on performing inference, refer to the introductory samples.
# The common.do_inference function will return a list of outputs - we only have one in this case.
[output] = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
pred = np.argmax(output)
print("Test Case: " + str(case_num))
print("Prediction: " + str(pred))
def main():
data_path = common.find_sample_data(description="Runs an MNIST network using a UFF model file", subfolder="mnist")
model_file = ModelData.MODEL_FILE
# This works fine
infer(model_file, data_path)
# Error
t = threading.Thread(target=infer, args=(model_file, data_path))
t.start()
t.join()
When I try to run it in different threads, it gives me the following error:
pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?
Try 1:
It seems like the CUDA context in the new thread is not initialized. So I comment out “import pycuda.autoinit” and try to initialize CUDA context. I add following code at the beginning and end of the ‘infer()’ function
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
# infer body
...
ctx.pop()
This works fine for the MNIST example. However, when I try another engine with CNN, I got the following error:
[TensorRT] ERROR: cuda/cudaConvolutionLayer.cpp (163) - Cudnn Error in execute: 7
[TensorRT] ERROR: cuda/cudaConvolutionLayer.cpp (163) - Cudnn Error in execute: 7
And again, this engine works good if I only run it in single thread.
Now I have no idea how to solve these problems. Anyone has any suggestions how to implement TensorRT as a multi-thread inference server? Any suggestion will be appreciated.
Thanks.