I implemented Object detector class which contains runtime cotext.
It runs on a single thread well(e.g. X fps).
But on multithreads (e.g. N thread),
it runs about X/N fps on each thread.
What shoud I check ?
And I had checked best practice
of tensorrt docs, but it is unclear for me.
More, please also pay attention to the GPU workload of your detection model.
If it already occupies 99% utilization, multithreads need to wait in turn for the resources.
Thanks for your reply.
Do I need to anything to context execute ?
[edit]
I generated my model using your command.
But the performance doesn’t change.
When I tried, I seemed to feel loading model time on others thread are shorter than first thread. Trt is used cache ?