NVIDIA Accelerates Inference on Meta Llama 4 Scout and Maverick

Originally published at: NVIDIA Accelerates Inference on Meta Llama 4 Scout and Maverick | NVIDIA Technical Blog

The newest generation of the popular Llama AI models is here with Llama 4 Scout and Llama 4 Maverick. Accelerated by NVIDIA open-source software, they can achieve over 40K output tokens per second on NVIDIA Blackwell B200 GPUs, and are available to try as NVIDIA NIM microservices. The Llama 4 models are now natively multimodal…

Hi, thanks for the performance data. Can you clarify some of the parameters for the results. Specifically, what is the input prompt size, what is the output size, what is the batch size? Can you also clarify the number of GPUs and total memory for this test?