GTC 2020 S21351
Presenters: Mohammad Zulfiqar,NVIDIA; Robert Knight,NVIDIA
We’ll dive deep behind the scenes into the Transformer model implementation in PyTorch to understand its performance weaknesses and work to make it scale across multiple nodes. We’ll describe an analysis of system-level profiling data of an example Transformer workload, spanning multiple DGX-2 systems. We’ll present the tools, collection methods, and data-analytics recipes, used to evaluate massive amounts of data and pinpoint the GPU/step of the algorithm causing issues. The described methodology can, in general, be applied to iterative DL and HPC workloads to achieve significant scaling gains.
Watch this session
Join in the conversation below.