Mixed Precision Training for NLP and Speech Recognition with OpenSeq2Seq

Originally published at: Mixed Precision Training for NLP and Speech Recognition with OpenSeq2Seq | NVIDIA Technical Blog

The success of neural networks thus far has been built on bigger datasets, better theoretical models, and reduced training time. Sequential models, in particular, could stand to benefit from even more from these. To this end, we created OpenSeq2Seq – an open-source, TensorFlow-based toolkit. OpenSeq2Seq supports a wide range of off-the-shelf models, featuring multi-GPU and…

Did you get NaN gradient problem when training model?
I mean `Vanishing Gradient Problem`