GTC 2020: Distributed Machine Learning on Virtualized Servers

GTC 2020 S21191
Presenters: Luke Wignall,NVIDIA ; Mohan Potheri,VMware; Boris Kovalev,Mellanox
Horovod is a distributed machine learning platform that can leverage GPUs for deep learning. We’ll talk about a joint project between NVIDIA, Mellanox, and VMware to create a high-performance platform leveraging NVIDIA vCompute Server, Mellanox-based high speed networking, and vSphere PVRDMA. We’ll compare the results of common benchmarks that ran with and without PVRDMA. We’ll also discuss a reference architecture for leveraging vCompute server for ML.

