I’m using torch2rt to convert my Torch model to TensorRT. TRTModule creates single execution context (IExecutionContext) for the runtime engine (https://github.com/NVIDIA-AI-IOT/torch2trt/blob/master/torch2trt/torch2trt.py#L333)
My inference code is concurrent and uses different CUDA streams for each inference execution.
Only single inference per stream is guaranteed to exist at each moment.
Is it correct to use just one execution context for multi stream concurrent inference?