TensorRT 10.2 is not using FP8 convolution tactics when building a FP8 quantized conv model

pord74571 · July 8, 2024, 3:57am

Description

TensorRT 10.2 added support for normal FP8 convolutions on Hopper GPUs.
So, I built an engine with a simple QDQ+Conv2d model quantized by TensorRT-Model-Optimizer, but, no FP8 Conv tactics are timed.

How can I use FP8 convolutions ?
In addition, I have some questions.

Does TRT 10.2 support normal FP8 convolutions on Ada Lovelace GPUs ?
Does TRT 10.2 support per-channel weight quantization ?
(Actually, TensorRT-Model-Optimizaer supports only per-tensor quantization for FP8 quantization)

Environment

TensorRT Version: 10.2
GPU Type: H100
Nvidia Driver Version:
CUDA Version: 12.5
CUDNN Version: N/A
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): N/A
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if container which image + tag): N/A

Relevant Files

simple_conv_fp8.onnx.zip (68.1 KB)

Steps To Reproduce

trtexec command I used:

$ trtexec --onnx=simple_conv_fp8.onnx --fp16 --fp8 --profilingVerbosity=detailed --verbose --exportLayerInfo=layerinfo.json

AakankshaS · July 8, 2024, 3:33pm

Hi @pord74571 ,
Checking on the same. Shall update you soon

pord74571 · July 10, 2024, 6:19pm

Hello, @AakankshaS
Are there any updates ?

Topic		Replies	Views
tensorRT FP8 support TensorRT tensorrt	2	2796	June 21, 2023
Unable to quantization FP8 in TensorRT TensorRT tensorrt	1	549	June 20, 2023
No speed up with TensorRT FP16 or INT8 on NVIDIA V100 TensorRT	7	2836	November 15, 2019
ConvNeXT inference with int8 quantization slower on tensorRT than fp32/fp16 TensorRT cudnn , tensorrt-model-optimizer	1	117	November 30, 2024
TensorRT5/6 FC Layer not support Int8 quantization. TensorRT	4	962	October 20, 2019
TensorRT --fp16 pre and post Int8 quantization TensorRT cudnn	1	105	September 2, 2024
Tensorrt8.2 convert .onnx model to .trt ,throw "two inputs (data and weights) are allowed only in explicit-quantization mode"error TensorRT	9	1982	April 6, 2022
About qat model conversion engine TensorRT cudnn	0	19	October 11, 2024
Quantization of D-FINE in tensorrt 10.8 fails TensorRT tensorrt , cudnn , onnx	3	127	April 30, 2025
ConvTranspose onnx to tensorrt conversion fail TensorRT	2	1200	June 24, 2021

TensorRT 10.2 is not using FP8 convolution tactics when building a FP8 quantized conv model

Description

Environment

Relevant Files

Steps To Reproduce

Related topics