Tensorrt multiple process


I created tensorrt engine file of a model and created a context and did inference in python.
It works fine for single inference. Now I’m trying to load different contexts in same python script. Used multithreading module in python.
It performs single inference in 30 ms but takes 112 ms when using two different contexts at the same time using two different threads.
I’m trying to load two inference simultaneously , using two different contexts so that two inference performs in 30 ms in total but seems like its not performing as expected. Please let me know if there is any document where i can read more about multiple inference/ multiple contxts in single python script.

I also used multiprocessing module in python , but somehow it doesnt let me do context.push.
Let me know what approach i can use.
I have tried loading 4 python scripts in different terminal , doing inference and it works fine. I want to incorporate it in single python script

Hi @jhanvi,

GPU can execute single context at a time. When you launch multiple contexts, it gets scheduled and causes increasing inference time. If you must use multiple processes, CUDA MPS may be useful.

Thank you.