GTC 2020: Tuning GPU Server for DL Performance

GTC 2020 S21501
Presenters: Frank Han,Dell ; Rengan Xu,Dell
Abstract
MLPerf training benchmark is a software suite for measuring how fast systems can train models to a target quality metric. Its version 0.6 has good coverage of deep-learning models in image classification, object detection, translation, and reinforcement learning. We’ll use those subtests to demonstrate how different hardware configurations (CPU core counts vs frequency, memory frequency 2666 vs 2933Mhz, PCIe vs NVLink) and storage (local SSD, U.2 NVMe, Isilon and Lustre) impacts those DL training workloads. We’ll also discuss our work to characterize MLPerf benchmark performance using profiling tools (GPU, CPU, memory, and I/O), our hyperparameter-tuning work (batch size, learning rate, SGD optimizer), software environments study (OS versions, CUDA drivers, docker versions, NCCL P2P levels, NCCL tree vs ring, etc.) on MLPerf performance of both single and distributed systems.

Watch this session
Join in the conversation below.

The PDF link is dead (404 not found), please fix it.