The decoder part of FasterTransformer is still the performance bottleneck of inference. Is it possible to add TensorRT plugin to improve the performance?
Hi,
Please refer to below links related custom plugin implementation and sample:
While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.
Thank you for your reply. To reduce the development work of each user, does Nvidia have any plan to optimize decoder and decoding in FasterTransformer future version? Thank you.