TRITON server performance degradation 20.03 => 20.07

Hi , after moving big project from triton 20.03 to triton 20.07
we see performance degradation.
CPU usage increases from 500% to ~ 1000% ( twice) while stress testing.
all models have ‘tensorrt_plan’ , and system uses only ‘cuda shared memory’ technique to interact.

It is not entirely clear why the triton uses so many CPU.
Could you please suggest some profiling technique to find exact place of performance degradation inside triton?

river Version: 470.57.02 CUDA Version: 11.4
5 GPU T4
Ubuntu 18.04.5 LTS

Please re-post your question on: Triton Inference Server · GitHub , the NVIDIA and other teams will be able to help you there.
Sorry for the inconvenience and thanks for your patience.