Description
TensorRT 10.2 added support for normal FP8 convolutions on Hopper GPUs.
So, I built an engine with a simple QDQ+Conv2d model quantized by TensorRT-Model-Optimizer, but, no FP8 Conv tactics are timed.
How can I use FP8 convolutions ?
In addition, I have some questions.
- Does TRT 10.2 support normal FP8 convolutions on Ada Lovelace GPUs ?
- Does TRT 10.2 support per-channel weight quantization ?
(Actually, TensorRT-Model-Optimizaer supports only per-tensor quantization for FP8 quantization)
Environment
TensorRT Version: 10.2
GPU Type: H100
Nvidia Driver Version:
CUDA Version: 12.5
CUDNN Version: N/A
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): N/A
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if container which image + tag): N/A
Relevant Files
simple_conv_fp8.onnx.zip (68.1 KB)
Steps To Reproduce
trtexec
command I used:
$ trtexec --onnx=simple_conv_fp8.onnx --fp16 --fp8 --profilingVerbosity=detailed --verbose --exportLayerInfo=layerinfo.json