GTC 2020: Scaling the Transformer Model Implementation in PyTorch Across Multiple Nodes

GTC 2020 S21351
Presenters: Mohammad Zulfiqar,NVIDIA; Robert Knight,NVIDIA
Abstract
We’ll dive deep behind the scenes into the Transformer model implementation in PyTorch to understand its performance weaknesses and work to make it scale across multiple nodes. We’ll describe an analysis of system-level profiling data of an example Transformer workload, spanning multiple DGX-2 systems. We’ll present the tools, collection methods, and data-analytics recipes, used to evaluate massive amounts of data and pinpoint the GPU/step of the algorithm causing issues. The described methodology can, in general, be applied to iterative DL and HPC workloads to achieve significant scaling gains.

Watch this session
Join in the conversation below.