High-Performance Next-Generation Deep-Learning Clusters

GTC 2020 S22047
Presenters: Julie Bernauer, NVIDIA
Abstract
From climate modeling to drug design, AI models are not fully part of scientific modeling, and AI models are getting more complex and larger every year. Until recently, system design for HPC and AI were often done in isolation as the requirements for the platforms were different, making large scientific experimentation difficult. To overcome these gaps, systems are now designed with AI software in mind, and scale is introduced in the software design from the ground up so that each model running at the edge can be trained in minutes at scale. The largest supercomputers in the world are now designed with AI in mind, and enterprise and AI research systems are being designed more like supercomputers. Last year, we showcased the Superpod as an example of rapid time-to-floor for an AI performance infrastructure; today, we’ll cover where the next generation is moving and show how we think about design and infrastructure that can be leveraged to support the needs of AI research and development teams, and how modern AI frameworks and models are built to leverage these systems.

Watch this session
Join in the conversation below.