GTC 2020: Performance and Model Fidelity of BERT Training from a Single DGX Through DGX SuperPod

nadeemm · March 23, 2020, 6:24am

GTC 2020 S21385
Presenters: Chris Forster,NVIDIA; Thor Johnsen,NVIDIA
Abstract
BERT (Bidirectional Encoder Representations from Transformers) is a natural language processing model that performs well on a wide variety of tasks, including (but not limited to) question answering, natural language inference, and classification. We’ll cover how you can use our open-source code to train BERT models themselves, right from dataset creation to fine-tuning for specific NLP tasks, such as question answering with the SQuAD dataset. We’ll also discuss some of the challenges and solutions to delivering both computational performance and model fidelity on large distributed machines, such as the DGX SuperPod. We’ll offer a brief overview of the model itself, choice of optimizers, performance optimizations, testing methodology, running BERT at scales up to 1,472 GPUs, and we’ll summarize the results that our open-source multi-node BERT examples in Tensorflow and PyTorch can achieve.

Watch this session
Join in the conversation below.