GTC 2020: Advanced Optimizations of Persistent Recurrent Neural Networks

GTC 2020 S21691
Presenters: Vasily Volkov,NVIDIA; Jeremy Appleyard,NVIDIA
Abstract
Recurrent Neural Networks (RNNs) with small batch sizes tend to be bandwidth-bound when implemented naively. Persisting the majority of the inputs in low-level GPU memory can turn the problem back into a compute-bound one and see order-of-magnitude speedups. We’ll dive into our methods to achieve performance in cuDNN’s persistent RNN implementation, many of which are applicable to other persistent methods.

Watch this session
Join in the conversation below.