Is NVIDIA working on TensorRT/Faster Transformer implementation for GPT2 or Other larger model e.g., Megatron-Turing Natural Language Generation model (MT-NLG) to support 2-4 Sparsity?
As of now GitHub - NVIDIA/FasterTransformer: Transformer related optimization, including BERT, GPT states sparsity is available only for BERT and Encoder.
Please refer to the following post,
As per the link the 2-4 Structured Sparsity is only for Megatron.
Is there any plan to have Sparsity for GPT2 6.7Billion model ?
Currently, we are not sure about it, It may be available in future releases.