Is it possible to add TensorRT plugin into FasterTransformer Decoder and Decoding

Dtracebug · September 27, 2021, 8:41am

Description

The decoder part of FasterTransformer is still the performance bottleneck of inference. Is it possible to add TensorRT plugin to improve the performance?

Environment

FasterTransformer Version: 4.0
TensorRT Version: 8.0
GPU Type: Ampere

NVES · September 27, 2021, 10:07am

Hi,
Please refer to below links related custom plugin implementation and sample:

While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.

Thanks!

Dtracebug · September 29, 2021, 9:10am

Thank you for your reply. To reduce the development work of each user, does Nvidia have any plan to optimize decoder and decoding in FasterTransformer future version? Thank you.

Dtracebug · September 29, 2021, 11:13am

BTW, is there any way to run Transformer inference using TensorRT? Bert are already been supported.

spolisetty · September 30, 2021, 4:51pm

Hi,

Currently we are not sure about it. Please stay tuned for updates on upcoming releases.

Thank you.