GTC 2020: Wide and Deep Recommender Inference on GPU

GTC 2020 S21559
Presenters: Alec Gunny ,NVIDIA; Chirayu Garg,NVIDIA
Abstract
We’ll discuss using GPUs to accelerate so-called “wide and deep” models in the recommendation inference setting. Machine learning-powered recommender systems permeate modern online platforms. Wide and deep models have become a popular choice for recommendation problems due to their high expressiveness compared to more traditional machine learning models, and the ease with which they can be trained and deployed using Tensorflow. We’ll demonstrate simple APIs to convert trained canned Tensorflow estimators to TensorRT executable engines and deploy them for inference using NVIDIA’s TensorRT Inference Server. The generated TensorRT engines can also be configured to enable reduced-precision computation that leverages tensor cores in NVIDIA GPUs. Finally, we’ll show how to integrate these served models into an optimized inference pipeline, exploiting shared request-level features across batches of items to minimize network traffic and fully leverage GPU acceleration.

Watch this session
Join in the conversation below.