The max value of trtexec is sometimes very big

I estimate the DNN’s performance using trtexec as follows.
My environment is Jetson AGX Orin Development Kit(Orin AGX 32GB Emulation Mode and Power Mode:40W)

$trtexec --loadEngine=test_model.trt --verbose 2>/dev/null | grep -e RUNNING -e “[I] GPU Compute”

[09/19/2023-18:29:31] [I] GPU Compute Time: min = 2.83447 ms, max = 3.14429 ms, mean = 2.88202 ms, median = 2.8493 ms, percentile(90%) = 3.05914 ms, percentile(95%) = 3.07028 ms, percentile(99%) = 3.0802 ms

However, I found max time is sometimes very big as follows .
The max value is about 2-2.5 times larger than the min, mean, and median values.

hiro@rpj-desktop:~/work/rpj_samples/trt_performance$ trtexec --loadEngine=test_model.trt --verbose 2>/dev/null | grep -e RUNNING -e “[I] GPU Compute”
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # trtexec --loadEngine=test_model.trt --verbose
[09/19/2023-18:29:12] [I] GPU Compute Time: min = 2.8313 ms, max = 7.27428 ms, mean = 2.93425 ms, median = 2.84644 ms, percentile(90%)
= 3.06354 ms, percentile(95%) = 3.07941 ms, percentile(99%) = 4.40619 ms

hiro@rpj-desktop:~/work/rpj_samples/trt_performance$ trtexec --loadEngine=test_model.trt --verbose 2>/dev/null | grep -e RUNNING -e “[I] GPU Compute”
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # trtexec --loadEngine=test_model.trt --verbose
[09/19/2023-18:29:21] [I] GPU Compute Time: min = 2.83105 ms, max = 5.54698 ms, mean = 2.92506 ms, median = 2.85059 ms, percentile(90%) = 3.07965 ms, percentile(95%) = 3.09659 ms, percentile(99%) = 4.6609 ms

So I want to ask following 3 questions?

Question 1) Is it a normal case that the max value is 2.5 times larger than mean value?

Question 2) Why the max value is 2.5 times larger than mean value?
Could you explain us about the cause?

Question 3)
Is there a workaround to keep max value stable?

I attached scripts as follows.
Please use the test_model_exec.sh for reproducing it.

trt_performance.tar.gz (47.5 KB)

Regards,
hiro

Hi,

1.
It’s common that the first CUDA kernel takes longer.
Could you add some warmup loop to see if the issue remains?

$ /usr/src/tensorrt/bin/trtexec --warmup=N ...

2.
Please check if the max value is measured by the first inference.
If so, it’s expected since there are some initializations required (e.g. loading library).

3.
If the longer execution time only occurs at the first launch, you can add some warmup loop to avoid it.

Thanks.

1 Like

Dear @AastaLLL

Thank you for your replay.
I understood the workaround.

Regards,
hiro

Dear @AastaLLL,

I found this is issue is related to idle mode.
When I maximize Jetson Orin Performance as follows, I got stable performance.

$ sudo /usr/bin/jetson_clocks

Jetson Orin NX Series and Jetson AGX Orin Series — Jetson Linux Developer Guide documentation (nvidia.com)

Hi,

Thanks for the feedback.
Here is some information for your reference.

By default, Jetson uses dynamic frequency so it might take longer for execution if the GPU starts from the low clock rate.
jetson_clocks is a script to lock the hardware clock (including GPU) to the maximum so the perf becomes much more stable.

Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.