Video Tutorial: Accelerating Inference Performance of Recommendation Systems with TensorRT

Originally published at: Video Tutorial: Accelerating Inference Performance of Recommendation Systems with TensorRT | NVIDIA Technical Blog

NVIDIA TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. You can import trained models from every deep learning framework into TensorRT, and easily create highly efficient inference engines that can be incorporated into larger applications and services. This video demonstrates the steps…