How to understand "Autotuning format combination" from trtexec log on fp16 mode

Description

There are some print at trtexec log as following:

[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(160000,1:8,1600,8) -> Half(160000,1:8,1600,8) ***************
[11/24/2022-07:22:28] [V] [TRT] =============== Computing costs for 
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Float(1280000,20000,200,1), Float(1280000,20000,200,1) -> Float(1280000,20000,200,1) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Float(1280000,1,12800,64), Float(1280000,1,12800,64) -> Float(1280000,1,12800,64) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(1280000,20000,200,1), Half(1280000,20000,200,1) -> Half(1280000,20000,200,1) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(640000,20000:2,200,1), Half(640000,20000:2,200,1) -> Half(640000,20000:2,200,1) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(160000,1:8,1600,8), Float(1280000,20000,200,1) -> Float(1280000,20000,200,1) ***************

How to understand “Half(160000,1:8,1600,8) → Half(160000,1:8,1600,8)”? Does it split weights to some formats that hardware wants?

Are there some docs to describe the Blocking logic?

Thanks in advance.

Environment

TensorRT Version: 8.4.1
GPU Type: V100
Nvidia Driver Version: 515
CUDA Version: 11.3

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#measure-performance

https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#model-accuracy

Thanks!

Hi, @NVES

Thanks for your reply. I just wander the logic of Half(160000,1:8,1600,8) → Half(160000,1:8,1600,8), are there any docs about it.

The cost time of model conversion from ONNX to TRT is half an hour when we set --fp16 flag. Are there any ways to accelerate this conversion so that we can convert more faster?

Thanks~

Hi,

Hope the following docs may help you.

Thank you.