GTC 2020: From Training to Inference: Maximizing Resource Usage and Reducing Cost with GPU Virtualization on VMware vSphere

nadeemm · March 23, 2020, 6:24am

GTC 2020 S21339
Presenters: Raj Rao,NVIDIA; Uday Kurkure,VMware; Lan Vu,VMware
Abstract
As machine learning and artificial intelligence are increasingly adopted across all industries, their workload share in data centers is growing. We’ll present use cases to optimize the cost and resource of your data center for ML on VMware vSphere with GPU virtualization, especially with NVIDIA GRID. We’ll discuss the differences in resource utilization between training and inference, and showcase techniques to maximize the benefits of GPU for your deep-learning workloads. These techniques include sharing GPU by multiple concurrent users or workloads, using GPU scheduling policies, and optimizing for training and inference in cloud environment. We’ll demonstrate how we applied these techniques in our real-world ML/AI applications at VMware and how they help us further improve the performance of these applications, enabling real-time analytics while reducing the cost of deployment with the latest Volta/Turing GPUs and NVIDIA GRID.

Watch this session
Join in the conversation below.