Blackwell Breaks the 1,000 TPS/User Barrier With Meta’s Llama 4 Maverick

Originally published at: https://developer.nvidia.com/blog/blackwell-breaks-the-1000-tps-user-barrier-with-metas-llama-4-maverick/

NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over 1,000 tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model, the largest and most powerful model available in the Llama 4 collection. This speed was independently…

Awesome work! I am just curious what prompt dataset the AL and speedup is measured in figure 5. Thanks for the help!