NVIDIA Clocks World’s Fastest BERT Training Time and Largest Transformer Based Model, Paving Path For Advanced Conversational AI

Originally published at: NVIDIA Clocks World’s Fastest BERT Training Time and Largest Transformer Based Model, Paving Path For Advanced Conversational AI | NVIDIA Technical Blog

NVIDIA DGX SuperPOD trains BERT-Large in just 47 minutes, and trains GPT-2 8B, the largest Transformer Network Ever with 8.3Bn parameters Conversational AI is an essential building block of human interactions with intelligent machines and applications – from robots and cars, to home assistants and mobile apps. Getting computers to understand human languages, with all their…

its very helpful and informative blog

What can one say to the inference times?

How big is the effort for an inference machine

with the trained GPT-2 8B model?

A SuperGLUE entry would worth a thousand blog posts.