In tensorrt 10.8 release note, Release Notes — NVIDIA TensorRT Documentation.
In Limitations
, there is a description
but the subsequent Best Practices
section contains the following description:
It seems that these two descriptions are conflicting. So, does TensorRT 10.8 support inserting pointwise operations between matmul
and softmax
when using fp8?