Hi,
I get very different inference timings from trtexec on 2 different Jetson Nano devices. They have different versions of Jetpack and TensorRT.
Device 1: Jetson Nano, Jetpack 4.4, TensorRT 7.1.3
Device 2: Jetson Nano, Jetpack 4.6, TensorRT 8.1.2
It seems like there is a problem with device 2. For any onnx model, trtexec on device 2 gives much larger times compared to device 1. The trtexec outputs of an onnx model are given below for both devices.
For Device 1:
[05/15/2023-10:23:54] [I] Host Latency
[05/15/2023-10:23:54] [I] min: 17.8914 ms (end to end 17.9038 ms)
[05/15/2023-10:23:54] [I] max: 36.7483 ms (end to end 38.3356 ms)
[05/15/2023-10:23:54] [I] mean: 24.061 ms (end to end 24.6089 ms)
[05/15/2023-10:23:54] [I] median: 19.6371 ms (end to end 20.1515 ms)
[05/15/2023-10:23:54] [I] percentile: 36.5361 ms at 99% (end to end 37.5812 ms at 99%)
[05/15/2023-10:23:54] [I] throughput: 40.6347 qps
[05/15/2023-10:23:54] [I] walltime: 3.02697 s
[05/15/2023-10:23:54] [I] Enqueue Time
[05/15/2023-10:23:54] [I] min: 6.39282 ms
[05/15/2023-10:23:54] [I] max: 11.272 ms
[05/15/2023-10:23:54] [I] median: 7.11914 ms
[05/15/2023-10:23:54] [I] GPU Compute
[05/15/2023-10:23:54] [I] min: 17.4719 ms
[05/15/2023-10:23:54] [I] max: 36.2471 ms
[05/15/2023-10:23:54] [I] mean: 23.6288 ms
[05/15/2023-10:23:54] [I] median: 19.182 ms
[05/15/2023-10:23:54] [I] percentile: 36.1172 ms at 99%
[05/15/2023-10:23:54] [I] total compute time: 2.90634 s
For Device 2:
[05/15/2023-12:41:32] [I] === Performance summary ===
[05/15/2023-12:41:32] [I] Throughput: 12.5568 qps
[05/15/2023-12:41:32] [I] Latency: min = 66.3296 ms, max = 198.232 ms, mean = 79.1126 ms, median = 67.4297 ms, percentile(99%) = 198.232 ms
[05/15/2023-12:41:32] [I] End-to-End Host Latency: min = 66.3831 ms, max = 210.906 ms, mean = 79.6359 ms, median = 67.4835 ms, percentile(99%) = 210.906 ms
[05/15/2023-12:41:32] [I] Enqueue Time: min = 7.81299 ms, max = 13.1021 ms, mean = 10.4438 ms, median = 10.4058 ms, percentile(99%) = 13.1021 ms
[05/15/2023-12:41:32] [I] H2D Latency: min = 3.33521 ms, max = 10.3575 ms, mean = 4.20191 ms, median = 3.6214 ms, percentile(99%) = 10.3575 ms
[05/15/2023-12:41:32] [I] GPU Compute Time: min = 61.6451 ms, max = 189.297 ms, mean = 73.8214 ms, median = 62.764 ms, percentile(99%) = 189.297 ms
[05/15/2023-12:41:32] [I] D2H Latency: min = 0.906616 ms, max = 1.24756 ms, mean = 1.08936 ms, median = 1.09595 ms, percentile(99%) = 1.24756 ms
[05/15/2023-12:41:32] [I] Total Host Walltime: 3.18552 s
[05/15/2023-12:41:32] [I] Total GPU Compute Time: 2.95285 s
Both devices are in MAXN mode and power suppliers are considered to be enough.
What may be the reason for this performance difference? Or, what can I do for troubleshooting?
Thanks.