tensorRT5 inference speed slown down in multithread application.

I wrote a demo using tensorRT5.1.5.0 to test SSD inference speed. In single thread case, speed is fast. I change demo to multithread, and test speed again. The inference speed slow down compared with single thread. And GPU Occupancy rate also turn down. Any ideas? Thank you!

May be related to this.

‘TensorRT supports multiple threads so long as each is used with a separate execution context.’

From: https://devtalk.nvidia.com/default/topic/1032481/jetson-tx2/how-to-run-trt-in-multithreading-/post/5252796/#5252796


Please refer below link for some suggestions for using TensorRT with multithread: