Accelerating Recommendation System Inference Performance with TensorRT

Originally published at: Accelerating Recommendation System Inference Performance with TensorRT | NVIDIA Technical Blog

NVIDIA TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. You can import trained models from every deep learning framework into TensorRT, easily create highly efficient inference engines that can be incorporated into larger applications and services. This video demonstrates the steps for…