Hi , after moving big project from triton 20.03 to triton 20.07
we see performance degradation.
CPU usage increases from 500% to ~ 1000% ( twice) while stress testing.
all models have ‘tensorrt_plan’ , and system uses only ‘cuda shared memory’ technique to interact.
It is not entirely clear why the triton uses so many CPU.
Could you please suggest some profiling technique to find exact place of performance degradation inside triton?
System:
river Version: 470.57.02 CUDA Version: 11.4
5 GPU T4
Ubuntu 18.04.5 LTS