Scaling Language Model Training to a Trillion Parameters Using Megatron

You know whats cooler than a billion parameters? A TRILLION parameters.
Click the image to read the article

Find more #DSotD posts

Have an idea you would like to see featured here on the Data Science of the Day?