How to understand "Autotuning format combination" from trtexec log on fp16 mode

JeffWang16 · November 26, 2022, 10:16am

Description

There are some print at trtexec log as following:

[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(160000,1:8,1600,8) -> Half(160000,1:8,1600,8) ***************
[11/24/2022-07:22:28] [V] [TRT] =============== Computing costs for 
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Float(1280000,20000,200,1), Float(1280000,20000,200,1) -> Float(1280000,20000,200,1) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Float(1280000,1,12800,64), Float(1280000,1,12800,64) -> Float(1280000,1,12800,64) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(1280000,20000,200,1), Half(1280000,20000,200,1) -> Half(1280000,20000,200,1) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(640000,20000:2,200,1), Half(640000,20000:2,200,1) -> Half(640000,20000:2,200,1) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(160000,1:8,1600,8), Float(1280000,20000,200,1) -> Float(1280000,20000,200,1) ***************

How to understand “Half(160000,1:8,1600,8) → Half(160000,1:8,1600,8)”? Does it split weights to some formats that hardware wants?

Are there some docs to describe the Blocking logic？

Thanks in advance.

Environment

TensorRT Version: 8.4.1
GPU Type: V100
Nvidia Driver Version: 515
CUDA Version: 11.3

NVES · November 26, 2022, 10:37am

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:

Thanks!

JeffWang16 · November 28, 2022, 9:56am

Hi, @NVES

Thanks for your reply. I just wander the logic of Half(160000,1:8,1600,8) → Half(160000,1:8,1600,8), are there any docs about it.

The cost time of model conversion from ONNX to TRT is half an hour when we set --fp16 flag. Are there any ways to accelerate this conversion so that we can convert more faster?

Thanks~

spolisetty · December 1, 2022, 6:36am

Hi,

Hope the following docs may help you.

Thank you.

Topic		Replies	Views
Whats the shapes in Autotuning format combination part mean using TensorRT with trtexec? TensorRT tensorrt	7	1561	October 12, 2021
TensorRT stuck on tuning plugin in FP16 mode TensorRT	1	441	October 22, 2022
TensorRT Half2 Accuracy Issue Jetson TX1	5	958	October 18, 2021
Use FP16 regardless if it is slower or not TensorRT	4	1065	May 16, 2022
Which layers of TensorRT will work in fp16 mode when enable the --half2 option? Jetson TX1	2	585	October 18, 2021
which layers of TensorRT will work in fp16 mode when enable the --half2 option? Jetson TX2	1	1041	March 17, 2017
Problem loading weights in half precision mode General	14	2879	October 12, 2021
Different FP16 inference with tensorrt and pytorch TensorRT	5	4674	October 25, 2021
TensorRT FP16 Data Type Conversion GPU-Accelerated Libraries	0	1219	December 30, 2017
Half2Mode (fast FP16) on TX1 with TensorRT 2.1 doesn't seem to work Jetson TX1	8	1848	October 18, 2021

How to understand "Autotuning format combination" from trtexec log on fp16 mode

Description

Environment

Related topics