TensorRT/Faster Transformer for GPT2/MT-NLG with Sparsity

Hi,
Is NVIDIA working on TensorRT/Faster Transformer implementation for GPT2 or Other larger model e.g., Megatron-Turing Natural Language Generation model (MT-NLG) to support 2-4 Sparsity?

As of now GitHub - NVIDIA/FasterTransformer: Transformer related optimization, including BERT, GPT states sparsity is available only for BERT and Encoder.

Hi,

Please refer to the following post,

Thank you.

Thank You,
As per the link the 2-4 Structured Sparsity is only for Megatron.
Is there any plan to have Sparsity for GPT2 6.7Billion model ?

Hi,

Currently, we are not sure about it, It may be available in future releases.

Thank you.

Hi, @spolisetty

Why NV doesn’t support Sparsity for onnx BERT model? Could you describe the reason? Thank you.