GTC 2020 S21691
Presenters: Vasily Volkov,NVIDIA; Jeremy Appleyard,NVIDIA
Recurrent Neural Networks (RNNs) with small batch sizes tend to be bandwidth-bound when implemented naively. Persisting the majority of the inputs in low-level GPU memory can turn the problem back into a compute-bound one and see order-of-magnitude speedups. We’ll dive into our methods to achieve performance in cuDNN’s persistent RNN implementation, many of which are applicable to other persistent methods.
Watch this session
Join in the conversation below.