I did some benchmark on Orin dev kit with GitHub - NVIDIA-AI-IOT/jetson_benchmarks: Jetson Benchmark
power mode MAXN
TensorRT 8.4 in Jetson Pack 5
The throughput is less than 300fps. Xavier NX is 800fps and Xavier AGX is 1,500fps.
What went wrong? Please advise.
/usr/src/tensorrt/bin/trtexec --onnx=/home/andy/nvidia/jetson_benchmarks/models/ssd-mobilenet-v1-bs16.onnx --useSpinWait --useCudaGraph --int8 --workspace=4096 --avgRuns=100 --duration=180
[05/06/2022-20:57:24] [I] === Performance summary ===
[05/06/2022-20:57:24] [I] Throughput: 296.952 qps
[05/06/2022-20:57:24] [I] Latency: min = 4.01562 ms, max = 12.0008 ms, mean = 4.78139 ms, median = 4.70898 ms, percentile(99%) = 6.21875 ms
[05/06/2022-20:57:24] [I] Enqueue Time: min = 0 ms, max = 0.867188 ms, mean = 0.0241754 ms, median = 0.0200195 ms, percentile(99%) = 0.0664062 ms
[05/06/2022-20:57:24] [I] H2D Latency: min = 0.503906 ms, max = 3.08443 ms, mean = 1.1139 ms, median = 1.13281 ms, percentile(99%) = 1.1875 ms
[05/06/2022-20:57:24] [I] GPU Compute Time: min = 3.20312 ms, max = 10.6925 ms, mean = 3.36628 ms, median = 3.27734 ms, percentile(99%) = 4.72656 ms
[05/06/2022-20:57:24] [I] D2H Latency: min = 0.140625 ms, max = 0.869156 ms, mean = 0.300852 ms, median = 0.300781 ms, percentile(99%) = 0.3125 ms
[05/06/2022-20:57:24] [I] Total Host Walltime: 180.009 s
[05/06/2022-20:57:24] [I] Total GPU Compute Time: 179.941 s
[05/06/2022-20:57:24] [W] * GPU compute time is unstable, with coefficient of variance = 9.40032%.
[05/06/2022-20:57:24] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.