NVIDIA Clocks World’s Fastest BERT Training Time and Largest Transformer Based Model, Paving Path For Advanced Conversational AI

jwitsoe · August 13, 2019, 1:01pm

Originally published at: NVIDIA Clocks World’s Fastest BERT Training Time and Largest Transformer Based Model, Paving Path For Advanced Conversational AI | NVIDIA Technical Blog

NVIDIA DGX SuperPOD trains BERT-Large in just 47 minutes, and trains GPT-2 8B, the largest Transformer Network Ever with 8.3Bn parameters Conversational AI is an essential building block of human interactions with intelligent machines and applications – from robots and cars, to home assistants and mobile apps. Getting computers to understand human languages, with all their…

anon23378554 · August 29, 2019, 7:52am

its very helpful and informative blog

anon74003474 · November 29, 2019, 2:45am

What can one say to the inference times?

How big is the effort for an inference machine

with the trained GPT-2 8B model?

anon3012247 · February 16, 2020, 7:01pm

A SuperGLUE entry would worth a thousand blog posts.