GTC 2020: Accelerating Linguistically-Informed BERT with Kubeflow at LinkedIn

GTC 2020 S22163
Presenters: Eddie Weill,NVIDIA; Abin Shahab,LinkedIn
Abstract
Kubeflow at LinkedIn has expanded beyond notebooks, training, and serving in the past year. We have now integrated ML workflows on Kubernetes with the Hadoop Distributed File System (HDFS). We’ll explain why we integrated Kerberized HDFS with Kubernetes, our implementation choices, and current challenges. We’re also working on multi-node, multi-GPU experiments with Kubeflow’s MPIJob operator, pre-training BERT using LinkedIn’s data, and hyper-parameter tuning with Microsoft Neural Network Intelligence (NNI) using Kubeflow to schedule distributed-training trials. We’ll discuss how we trained the models, fine-tuning, knowledge distillation, and model and experiment performance. We’re training models leveraging PyTorch to generate member graph embedding. We’ll discuss link prediction, including member-to-member and member-to-entity such as skill, title, and company.

Watch this session
Join in the conversation below.