GTC 2020: NVTabular: GPU Accelerated ETL for Recommender Systems

GTC 2020 S21651
Presenters: Julio Perez,NVIDIA; Even Oldridge, NVIDIA
Abstract
Recommender systems require massive datasets to train, particularly for deep learning based solutions. The transformation of these datasets in order to prepare them for model training is particularly challenging. Often the time taken to do steps such as feature engineering, categorical encoding and normalization of continuous variables exceeds the time it takes to train a model. NVTabular is an open source feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems. It provides a high level abstraction to simplify code, making development faster, and accelerates computation on the GPU using the RAPIDS cuDF library. It is available for download and contributions at GitHub - NVIDIA-Merlin/NVTabular: NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.. A part of the Merlin Recommenders Framework, it pairs perfectly with HugeCTR to provide a straightforward method to train huge deep learning based recommender systems on GPU.

Watch this session
Join in the conversation below.