Inconsistent inference time for uff model

mugurcal · August 13, 2020, 10:53am

Hi, I am using my custom model with DriveWorks to infer. But the inference time changes drastically between the executions. First it was around 26 ms for batchsize 1 and 169 ms for batchsize 6. Then it inferred same batches in 110 ms , 660 ms (roughly) respectively. I have used trtexec to check if there is something wrong with my code. Results was the same, I got two different inference time (26ms, 110ms) with the same batchsize (1) for different executions of trtexec command. I am not sure what causes the issue. I would be glad if you can enlighten me and help me to solve this issue. Thanks in advance.

Hardware Platform: [Example: DRIVE AGX Xavier™ Developer Kit, DRIVE AGX Pegasus™ Developer Kit]
Software Version: [Example: DRIVE Software 10]
Host Machine Version: [Ubuntu 18.04]
SDK Manager Version: [Example: 1.0.1.63443]

VickNV · August 13, 2020, 8:03pm

Hi @mugurcal,

Can you observe the inconsistent performance problem by running trtexec with other uff models?

mugurcal · August 14, 2020, 6:23am

I have tried with other uff files also I have executed trtexec on another machine which has the same GPU. On the other machine I can get 26ms inference time consistently and now I get only 110ms on my machine. For now there is no fluctuation in timings but there is a huge performance difference between the machines.

VickNV · August 14, 2020, 1:20pm

Do yo mean the inconsistent problem is not only on a specific uff but also on other uff? on a specific target system or between any systems? Could you share trtexec commands and outputs? Thanks!

mugurcal · August 14, 2020, 2:17pm

Yes, inconsistency problem is not only on a specific uff. I think the problem is about my host machine, since I checked uff file with trtexec on another host machine and also on target platform. trtexec commands and outputs are as follows:

./trtexec --uff=/home/user/nvidia/tensorRT_ws/model.uff --uffInput=Placeholder,3,480,760 --output=ln_segment/conv6/BiasAdd --output=fs_segment/conv6/BiasAdd --fp16 --workspace=2048 
&&&& RUNNING TensorRT.trtexec # ./trtexec --uff=/home/user/nvidia/tensorRT_ws/model.uff --uffInput=Placeholder,3,480,760 --output=ln_segment/conv6/BiasAdd --output=fs_segment/conv6/BiasAdd --fp16 --workspace=2048
[I] uff: /home/user/nvidia/tensorRT_ws/model.uff
[I] uffInput: Placeholder,3,480,760
[I] output: ln_segment/conv6/BiasAdd
[I] output: fs_segment/conv6/BiasAdd
[I] fp16
[I] workspace: 2048
[I] Average over 10 runs is 112.692 ms (host walltime is 113.193 ms, 99% percentile time is 123.687).
[I] Average over 10 runs is 117.149 ms (host walltime is 117.538 ms, 99% percentile time is 127.359).
[I] Average over 10 runs is 114.324 ms (host walltime is 114.796 ms, 99% percentile time is 127.637).
[I] Average over 10 runs is 115.887 ms (host walltime is 116.507 ms, 99% percentile time is 124.467).
[I] Average over 10 runs is 110.371 ms (host walltime is 110.912 ms, 99% percentile time is 119.576).
[I] Average over 10 runs is 119.828 ms (host walltime is 120.39 ms, 99% percentile time is 145.555).
[I] Average over 10 runs is 137.248 ms (host walltime is 137.859 ms, 99% percentile time is 145.85).
[I] Average over 10 runs is 110.621 ms (host walltime is 111.128 ms, 99% percentile time is 118.925).
[I] Average over 10 runs is 110.911 ms (host walltime is 111.376 ms, 99% percentile time is 118.04).
[I] Average over 10 runs is 110.206 ms (host walltime is 110.685 ms, 99% percentile time is 117.555).
&&&& PASSED TensorRT.trtexec # ./trtexec --uff=/home/user/nvidia/tensorRT_ws/model.uff --uffInput=Placeholder,3,480,760 --output=ln_segment/conv6/BiasAdd --output=fs_segment/conv6/BiasAdd --fp16 --workspace=2048

Since yesterday whenever I executed trtexec on my host machine, I got the same results (110-120ms inference time). I didn’t witness 26ms inference time on my host machine again, while on another host machine (with same GPU model) the inference time around 25-30ms. I have removed and reinstalled CUDA, TensorRT etc. but nothing has changed. Is there a log file specific to Nvidia tools that I can understand what has changed with my system? Thanks.

VickNV · August 14, 2020, 3:48pm

It’s more clear now. That’s why I didn’t observe the problem on both my host machine and target platform.
Did you mean you didn’t have the problem on the same host machine before? What’s its GPU? Will setting up a new ubuntu help?

mugurcal · August 18, 2020, 8:21am

The reason behind the problem understood. It is a power issue caused by docking station connected to host machine. Thanks for your reply.

VickNV · August 18, 2020, 4:12pm

Thanks for letting us know!

Topic		Replies	Views
TensorRT inference Time TensorRT	1	788	September 20, 2018
Low Compute utilization of converted TensorFlow model during inference Jetson TX2	19	1784	October 18, 2021
Inference time changes after training TensorRT tensorrt	5	626	September 25, 2020
Why my inference time is so long when using trtexec - FP16? Jetson TX2 jetson-inference	4	2028	October 18, 2021
inference time of UFF using tensorrt is slower than tensorflow Jetson TX2	9	2810	October 18, 2021
Inference Time is not stable TensorRT	10	1820	January 3, 2019
Inference slow using nvInfer and TensorRT directly into PX2 General	6	793	April 17, 2019
uff inference time large than pb time when process vgg 19 TensorRT	10	1300	December 4, 2018
Inference time on Jetson Xavier compared with local host PC？ Jetson AGX Xavier	8	843	October 18, 2021
Different TensorRT inference results for the same input TensorRT	2	1542	October 23, 2018

Inconsistent inference time for uff model

Related topics