TensorRT 10.2 is not using FP8 convolution tactics when building a FP8 quantized conv model


TensorRT 10.2 added support for normal FP8 convolutions on Hopper GPUs.
So, I built an engine with a simple QDQ+Conv2d model quantized by TensorRT-Model-Optimizer, but, no FP8 Conv tactics are timed.

How can I use FP8 convolutions ?
In addition, I have some questions.

  1. Does TRT 10.2 support normal FP8 convolutions on Ada Lovelace GPUs ?
  2. Does TRT 10.2 support per-channel weight quantization ?
    (Actually, TensorRT-Model-Optimizaer supports only per-tensor quantization for FP8 quantization)


TensorRT Version: 10.2
GPU Type: H100
Nvidia Driver Version:
CUDA Version: 12.5
CUDNN Version: N/A
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): N/A
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if container which image + tag): N/A

Relevant Files

simple_conv_fp8.onnx.zip (68.1 KB)

Steps To Reproduce

trtexec command I used:

$ trtexec --onnx=simple_conv_fp8.onnx --fp16 --fp8 --profilingVerbosity=detailed --verbose --exportLayerInfo=layerinfo.json

Hi @pord74571 ,
Checking on the same. Shall update you soon

1 Like

Hello, @AakankshaS
Are there any updates ?