TensorRT Python Client Runtime Error

I am having issues running TensorRT python client on one of our systems. More specifically, TensorRT python client runtime API cannot run on multiple processes. TensorRT python client runtime API is working on a single process.

Here are the methods/work arounds/tests that I have tried,
1.Running the TensorRT runtime API outside of the multi processing first, then running the created runtime object into the multi process. The runtime object then deserialises into a cuda engine. This gives me a runtime error ([TensorRT] ERROR: cudaDeviceProfile.cpp (52) - Cuda Error in generateForCurrent: 3 (initialization error))
2.Works fine on the python console
3.Works fine on the TensorRT python sample code
4.Created TensorRT engine and context outside of multiprocessing process and injected the objects into the process. However, during inference, an error occured, “[TensorRT] ERROR: engine.cpp (370) - Cuda Error in ~ExecutionContext: 3 (initialization error)
terminate called after throwing an instance of ‘nvinfer1::CudaError’
what(): std::exception”

Please assist on this.

Please refer to - https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#thread-safety

The TensorRT runtime can be used by multiple threads simultaneously, so long as each object uses a different execution context.

So, do your runtimes run on different contexts?

Please check my post:https://devtalk.nvidia.com/default/topic/1061870/tensorrt/can-tensorrt-do-inference-in-python-thread-or-subprocess-/post/5377259/#5377259

It could be useful to you.

My implementation of multiprocessing is created using the ‘fork’ instead of ‘spawn’ method. May I know if TensorRT python client is compatible with the ‘fork’ method of multiprocessing. Please assist. Thank you.

Sorry! I misread it multithread! Even the CUDA context can’t be shared between processes, IOW, each process own its own CUDA context. So the CUDA based TensorRT can’t share any objects between processes.
In your case, when a new process access the CUDA resouce that created on another CUDA conext, it’s expected to report - Cuda Error in generateForCurrent: 3 (initialization error)).
BTW, I would highlight that a CUDA kernel from one CUDA context cannot execute concurrently with a CUDA kernel from another CUDA context, so multiple CPU proceess with multiple CUDA conext may can’t utilize the GPU 100%.

If you still want to run with multiple process, you need to create own TensorRT resource in different process.
And, refer to https://wiki.tiker.net/PyCuda/FrequentlyAskedQuestions#How_does_PyCUDA_handle_threading.3F and luisyin’s comment for CUDA creation in different process.

Noted. I have tried implementing the TensorRT runtime into the multiprocessing. However, it still did not resolve the issue. Would this be resolved in the future releases of TensorRT?