first,qps:Throughput: 115.914 qps,Latency: min = 9.33789 ms, max = 14.3026 ms, mean = 9.67278 ms, median = 9.61938 ms, percentile(99%) = 10.2032 ms
[11/22/2022-02:23:05] [I] Enqueue Time: min = 0.906738 ms, max = 2.44945 ms, mean = 1.70993 ms, median = 1.74048 ms, percentile(99%) = 2.29266 ms
[11/22/2022-02:23:05] [I] H2D Latency: min = 0.406738 ms, max = 0.47998 ms, mean = 0.42492 ms, median = 0.422363 ms, percentile(99%) = 0.471191 ms
[11/22/2022-02:23:05] [I] GPU Compute Time: min = 8.26126 ms, max = 13.2157 ms, mean = 8.58368 ms, median = 8.53056 ms, percentile(99%) = 9.12329 ms
[11/22/2022-02:23:05] [I] D2H Latency: min = 0.653076 ms, max = 0.677811 ms, mean = 0.664174 ms, median = 0.663818 ms, percentile(99%) = 0.671997 ms
[11/22/2022-02:23:05] [I] Total Host Walltime: 3.02811 s
[11/22/2022-02:23:05] [I] Total GPU Compute Time: 3.01287 s
[11/22/2022-02:23:05] [W] * GPU compute time is unstable, with coefficient of variance = 3.40474%.
[11/22/2022-02:23:05] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[11/22/2022-02:23:05] [I] Explanations of the performance metrics are printed in the verbose logs.
second, the total_fps = 8*16=128
$ sudo nvidia-smi -pm ENABLED -i 0 // suppose T4 GPU id is 0
$ sudo nvidia-smi -ac “5001,1590” -i 0 // set memory clock and the graphics clock
$ nvidia-smi -q -d CLOCK -i 0 // confirm
And, since the fps you got is lower than what I got as screenshot above (115 vs 137), besides GPU clock, CPU capability may be another possible reason.