Performance difference between Jetpack and TensorRT versions

sanverfatih · May 15, 2023, 10:39am

Hi,

I get very different inference timings from trtexec on 2 different Jetson Nano devices. They have different versions of Jetpack and TensorRT.

Device 1: Jetson Nano, Jetpack 4.4, TensorRT 7.1.3
Device 2: Jetson Nano, Jetpack 4.6, TensorRT 8.1.2

It seems like there is a problem with device 2. For any onnx model, trtexec on device 2 gives much larger times compared to device 1. The trtexec outputs of an onnx model are given below for both devices.

For Device 1:
[05/15/2023-10:23:54] [I] Host Latency
[05/15/2023-10:23:54] [I] min: 17.8914 ms (end to end 17.9038 ms)
[05/15/2023-10:23:54] [I] max: 36.7483 ms (end to end 38.3356 ms)
[05/15/2023-10:23:54] [I] mean: 24.061 ms (end to end 24.6089 ms)
[05/15/2023-10:23:54] [I] median: 19.6371 ms (end to end 20.1515 ms)
[05/15/2023-10:23:54] [I] percentile: 36.5361 ms at 99% (end to end 37.5812 ms at 99%)
[05/15/2023-10:23:54] [I] throughput: 40.6347 qps
[05/15/2023-10:23:54] [I] walltime: 3.02697 s
[05/15/2023-10:23:54] [I] Enqueue Time
[05/15/2023-10:23:54] [I] min: 6.39282 ms
[05/15/2023-10:23:54] [I] max: 11.272 ms
[05/15/2023-10:23:54] [I] median: 7.11914 ms
[05/15/2023-10:23:54] [I] GPU Compute
[05/15/2023-10:23:54] [I] min: 17.4719 ms
[05/15/2023-10:23:54] [I] max: 36.2471 ms
[05/15/2023-10:23:54] [I] mean: 23.6288 ms
[05/15/2023-10:23:54] [I] median: 19.182 ms
[05/15/2023-10:23:54] [I] percentile: 36.1172 ms at 99%
[05/15/2023-10:23:54] [I] total compute time: 2.90634 s

For Device 2:

[05/15/2023-12:41:32] [I] === Performance summary ===
[05/15/2023-12:41:32] [I] Throughput: 12.5568 qps
[05/15/2023-12:41:32] [I] Latency: min = 66.3296 ms, max = 198.232 ms, mean = 79.1126 ms, median = 67.4297 ms, percentile(99%) = 198.232 ms
[05/15/2023-12:41:32] [I] End-to-End Host Latency: min = 66.3831 ms, max = 210.906 ms, mean = 79.6359 ms, median = 67.4835 ms, percentile(99%) = 210.906 ms
[05/15/2023-12:41:32] [I] Enqueue Time: min = 7.81299 ms, max = 13.1021 ms, mean = 10.4438 ms, median = 10.4058 ms, percentile(99%) = 13.1021 ms
[05/15/2023-12:41:32] [I] H2D Latency: min = 3.33521 ms, max = 10.3575 ms, mean = 4.20191 ms, median = 3.6214 ms, percentile(99%) = 10.3575 ms
[05/15/2023-12:41:32] [I] GPU Compute Time: min = 61.6451 ms, max = 189.297 ms, mean = 73.8214 ms, median = 62.764 ms, percentile(99%) = 189.297 ms
[05/15/2023-12:41:32] [I] D2H Latency: min = 0.906616 ms, max = 1.24756 ms, mean = 1.08936 ms, median = 1.09595 ms, percentile(99%) = 1.24756 ms
[05/15/2023-12:41:32] [I] Total Host Walltime: 3.18552 s
[05/15/2023-12:41:32] [I] Total GPU Compute Time: 2.95285 s

Both devices are in MAXN mode and power suppliers are considered to be enough.

What may be the reason for this performance difference? Or, what can I do for troubleshooting?

Thanks.

AastaLLL · May 16, 2023, 3:37am

Hi,

Just want to confirm first.
Do you use the same model for testing?

More, have you fixed the device clock to the maximal?

$ sudo jetson_clocks

Thanks.

sanverfatih · May 16, 2023, 7:15am

Yes, I use the same model for testing.

And, the result didn’t change after jetson_clocks.

Thanks.

Edit:
When I check the jetson_clocks with --show argument, EMC frequency seems to stay at min, which is 204 MHz. Is this a problem? The consol output is given below.

ubuntu@ubuntu:~$ sudo jetson_clocks --show
SOC family:tegra210 Machine:NVIDIA Jetson Nano Developer Kit
Online CPUs: 0-3
cpu0: Online=1 Governor=schedutil MinFreq=102000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
cpu1: Online=1 Governor=schedutil MinFreq=102000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
cpu2: Online=1 Governor=schedutil MinFreq=102000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
cpu3: Online=1 Governor=schedutil MinFreq=102000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
GPU MinFreq=76800000 MaxFreq=921600000 CurrentFreq=921600000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=204000000 FreqOverride=1
NV Power Mode: MAXN

AastaLLL · May 17, 2023, 7:23am

Hi,

The GPU is fixed to max so it should be okay (jetson_clocks).
Could you share the model with us so we can reproduce it in our environment?

Thanks.

sanverfatih · May 18, 2023, 5:07am

Hi,

I will share a different model, but, the results are very similar.

GPU mean: 22ms (device 1)
GPU mean: 69ms (device 2)

mobilenetv2-7.onnx (13.6 MB)

AastaLLL · May 25, 2023, 5:55am

Hi,

Could you also share the tegrastats for both devices with us?
More, have you reproduced the same issue with other newer GPU architecture?

Thanks.

sanverfatih · May 26, 2023, 6:50am

Hi,

We detected the issue with device 2. The device was broken due to a wrong dtb file update.

Timings are all good after re-flashing the device.

Thank you.

system · June 21, 2023, 12:59am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TRT engine returns nan on jetson nano Jetson Nano tensorrt	7	563	January 31, 2023
Jetpack 3.3 trt inference time is better than Jetpack 4.4 trt inference time Jetson TX1	5	535	September 30, 2020
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1425	July 12, 2022
Same tensorrt layer perform differently on jetson TensorRT	2	221	May 31, 2024
Inference time changes after training TensorRT tensorrt	5	592	September 25, 2020
Low FPS on Jetson Nano using TensorRT Jetson Nano tensorrt , tensorflow	7	1230	August 27, 2020
Yolov5 + TensorRT results seems weird on Jetson Nano 4GB TensorRT	5	2061	January 24, 2022
Performance Discrepancy - Python API vs. trtexec on Jetson AGX Orin Board Jetson AGX Orin jetson-inference	8	755	July 10, 2023
Can I get a performance benchmark data according to differnet TensorRT version Jetson Nano tensorrt	4	443	February 24, 2022
Performance of trt-pose on Jetson Nano Jetson Nano tensorrt , nvbugs	6	1089	March 26, 2020

Performance difference between Jetpack and TensorRT versions

Related topics