Blackwell Breaks the 1,000 TPS/User Barrier With Meta’s Llama 4 Maverick

jwitsoe · May 23, 2025, 12:09am

Originally published at: https://developer.nvidia.com/blog/blackwell-breaks-the-1000-tps-user-barrier-with-metas-llama-4-maverick/

NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over 1,000 tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model, the largest and most powerful model available in the Llama 4 collection. This speed was independently…

zolicsaki1 · June 9, 2025, 5:02am

Awesome work! I am just curious what prompt dataset the AL and speedup is measured in figure 5. Thanks for the help!

Topic		Replies	Views
Blackwell, Meta의 Llama 4 Maverick을 활용해 사용자당 1,000 TPS 장벽 돌파 Technical Blog - South Korea llama	1	22	June 2, 2025
NVIDIA Accelerates Inference on Meta Llama 4 Scout and Maverick Technical Blog nim , llama	2	185	April 12, 2025
NVIDIA Blackwell Platform Sets New LLM Inference Records in MLPerf Inference v4.1 Technical Blog	2	35	August 28, 2024
NVIDIA, Meta Llama 4 Scout 및 Maverick에서의 추론 가속화 Technical Blog - South Korea nim , llama	1	33	April 18, 2025
NVIDIA Blackwell Delivers Massive Performance Leaps in MLPerf Inference v5.0 Technical Blog	1	14	April 2, 2025
NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance Technical Blog	3	113	July 17, 2025
Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding Technical Blog llama	3	196	February 3, 2025
Low Latency Inference Chapter 1: Up to 1.9X Higher Llama 3.1 Performance with Medusa on NVIDIA HGX H200 with NVLink Switch Technical Blog llama	1	62	August 28, 2024
NVIDIA H200 Tensor Core GPUs and NVIDIA TensorRT-LLM Set MLPerf LLM Inference Records Technical Blog	1	276	March 27, 2024
TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x Technical Blog	4	106	January 9, 2025

Blackwell Breaks the 1,000 TPS/User Barrier With Meta’s Llama 4 Maverick

Related topics