Hi,
Is NVIDIA working on TensorRT/Faster Transformer implementation for GPT2 or Other larger model e.g., Megatron-Turing Natural Language Generation model (MT-NLG) to support 2-4 Sparsity?
As of now GitHub - NVIDIA/FasterTransformer: Transformer related optimization, including BERT, GPT states sparsity is available only for BERT and Encoder.
Hi,
Please refer to the following post,
Hi,
Sorry for the delayed response.
Currently, doesn’t support Sparsity for transformers if the ONNX is used.
The only way to use BERT with sparsity is to use the demo BERT in OSS.
Thank you.
Thank you.
Thank You,
As per the link the 2-4 Structured Sparsity is only for Megatron.
Is there any plan to have Sparsity for GPT2 6.7Billion model ?
Hi,
Currently, we are not sure about it, It may be available in future releases.
Thank you.