Scaling Language Model Training to a Trillion Parameters Using Megatron

jwitsoe · April 12, 2021, 5:00pm

Originally published at: Scaling Language Model Training to a Trillion Parameters Using Megatron | NVIDIA Technical Blog

Natural Language Processing (NLP) has seen rapid progress in recent years as computation at scale has become more available and datasets have become larger. At the same time, recent work has shown large language models to be effective few-shot learners, with high accuracy on many NLP datasets without additional finetuning. As a result, state-of-the-art NLP…

deepakn · April 12, 2021, 9:33pm

Happy to answer questions on the post or the work more broadly! More details are in our arXiv paper: [2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM.

Our work is open sourced at GitHub - NVIDIA/Megatron-LM: Ongoing research training transformer models at scale and we would love for people to try it out!

Topic		Replies	Views
State-of-the-Art Language Modeling Using Megatron on the NVIDIA A100 GPU Technical Blog	1	604	April 5, 2023
Understanding Natural Language with Deep Neural Networks Using Torch Technical Blog	18	528	September 26, 2016
How fast does power-efficiency improve for GPU's? I wonder the rate of power efficiency developm CUDA Programming and Performance	19	27519	July 12, 2011
The New Parallel Forall Technical Blog	1	329	November 12, 2013
Scaling Language Model Training to a Trillion Parameters Using Megatron Data Science of the Day ai , fun-facts , natural-language-processing-nlp	0	1256	June 7, 2021
Boosting NVIDIA MLPerf Training v1.1 Performance with Full Stack Optimization Technical Blog	2	1253	April 3, 2022
Efficiently Scale LLM Training Across a Large GPU Cluster with Alpa and Ray Technical Blog	0	424	May 15, 2023
P2P communication from Multi-GPUs for real applications CUDA Programming and Performance	3	1020	January 4, 2017
Megatron-LM distributed training error Deep Learning (Training & Inference)	0	474	October 17, 2019
Setting New Records at Data Center Scale Using NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand Technical Blog	0	341	November 8, 2023

Scaling Language Model Training to a Trillion Parameters Using Megatron

Related topics