Description
There are some print at trtexec log as following:
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(160000,1:8,1600,8) -> Half(160000,1:8,1600,8) ***************
[11/24/2022-07:22:28] [V] [TRT] =============== Computing costs for
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Float(1280000,20000,200,1), Float(1280000,20000,200,1) -> Float(1280000,20000,200,1) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Float(1280000,1,12800,64), Float(1280000,1,12800,64) -> Float(1280000,1,12800,64) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(1280000,20000,200,1), Half(1280000,20000,200,1) -> Half(1280000,20000,200,1) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(640000,20000:2,200,1), Half(640000,20000:2,200,1) -> Half(640000,20000:2,200,1) ***************
[11/24/2022-07:22:28] [V] [TRT] *************** Autotuning format combination: Half(160000,1:8,1600,8), Float(1280000,20000,200,1) -> Float(1280000,20000,200,1) ***************
How to understand “Half(160000,1:8,1600,8) → Half(160000,1:8,1600,8)”? Does it split weights to some formats that hardware wants?
Are there some docs to describe the Blocking logic?
Thanks in advance.
Environment
TensorRT Version: 8.4.1
GPU Type: V100
Nvidia Driver Version: 515
CUDA Version: 11.3