State-of-the-Art Language Modeling Using Megatron on the NVIDIA A100 GPU

Originally published at: https://developer.nvidia.com/blog/language-modeling-using-megatron-a100-gpu/

Recent work has demonstrated that larger language models dramatically advance the state of the art in natural language processing (NLP) applications such as question-answering, dialog systems, summarization, and article completion. However, during training, large models do not fit in the available memory of a single accelerator, requiring model parallelism to split the parameters across multiple…