In tensorrt 10.8 release note, Release Notes — NVIDIA TensorRT Documentation . In Limitations, there is a description [image] but the subsequent Best Practices section contains the following description: [image] It seems that these two descriptions are conflicting. So, does TensorRT 10.8 supp…

Does tensorrt10.8 support pointwise op between matmul and softmax when using fused mha

AI & Data Science Deep Learning (Training & Inference) TensorRT

lijinghaiwhu February 7, 2025, 3:24am 1

In tensorrt 10.8 release note, Release Notes — NVIDIA TensorRT Documentation.
In Limitations, there is a description

but the subsequent Best Practices section contains the following description:

It seems that these two descriptions are conflicting. So, does TensorRT 10.8 support inserting pointwise operations between matmul and softmax when using fp8?

Topic		Replies	Views
Build engine error when use pointnet-like structure and TensorRT 8.0.1.6 TensorRT tensorrt	13	1684	January 14, 2022
Matrix Multiplication -> PointWise Operation is Always Read as an MHA Pattern cuDNN cudnn	2	66	January 16, 2025
TensorRT 10.2 is not using FP8 convolution tactics when building a FP8 quantized conv model TensorRT tensorrt , tensorrt-model-optimizer	2	248	July 10, 2024
CUDNN_STATUS_NOT_SUPPORTED with point wise operation fusion before matmul cuDNN	1	655	June 21, 2023
tensorRT8.4.1 dynamic shape, Could not find any implementation for node {ForeignNode TensorRT	1	712	January 13, 2023
tensorRT FP8 support TensorRT tensorrt	2	2795	June 21, 2023
Unable to quantization FP8 in TensorRT TensorRT tensorrt	1	549	June 20, 2023
Is there any layer that fp16 supports but int8 does not？ TensorRT	5	494	December 1, 2021
It seems Pow operator in tensorrt reduce the precision Automatically TensorRT	2	548	May 27, 2022
No speed up with TensorRT FP16 or INT8 on NVIDIA V100 TensorRT	7	2836	November 15, 2019

Does tensorrt10.8 support pointwise op between matmul and softmax when using fused mha

Related topics