Is it possible to add TensorRT plugin into FasterTransformer Decoder and Decoding


The decoder part of FasterTransformer is still the performance bottleneck of inference. Is it possible to add TensorRT plugin to improve the performance?


FasterTransformer Version: 4.0
TensorRT Version: 8.0
GPU Type: Ampere

Please refer to below links related custom plugin implementation and sample:

While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.


Thank you for your reply. To reduce the development work of each user, does Nvidia have any plan to optimize decoder and decoding in FasterTransformer future version? Thank you.

BTW, is there any way to run Transformer inference using TensorRT? Bert are already been supported.


Currently we are not sure about it. Please stay tuned for updates on upcoming releases.

Thank you.