NVIDIA TensorRT-LLM Revs Up Inference for Google Gemma

jwitsoe February 21, 2024, 1:00pm 1

Originally published at: https://developer.nvidia.com/blog/nvidia-tensorrt-llm-revs-up-inference-for-google-gemma/

NVIDIA is collaborating as a launch partner with Google in delivering Gemma, a newly optimized family of open models built from the same research and technology used to create the Gemini models. An optimized release with TensorRT-LLM gives users the ability to develop with LLMs using only a desktop with an NVIDIA RTX GPU. Created by…

Topic		Replies	Views
Google's New Gemma 2 Model Now Optimized and Available on NVIDIA API Catalog Technical Blog	2	292	August 28, 2024
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	2043	January 25, 2024
NVIDIA TensorRT-LLM Supercharges Large Language Model Inference on NVIDIA H100 GPUs Technical Blog	5	1198	September 27, 2023
Gemma2 support in tensor RT LLM TensorRT llm , gemma-2-9b-it	0	178	July 24, 2024
NVIDIA TensorRT-LLM Enhancements Deliver Massive Large Language Model Speedups on NVIDIA H200 Technical Blog	0	453	December 5, 2023
Boosting Llama 3.1 405B Performance up to 1.44x with NVIDIA TensorRT Model Optimizer on NVIDIA H200 GPUs Technical Blog llama	2	161	September 17, 2024
Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX Announcements llama	0	184	July 3, 2025
Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX Announcements llama	0	175	June 30, 2025
Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX Technical Blog	1	160	June 26, 2025
NVIDIA TensorRT-LLM 및 NVIDIA Triton Inference Server로 Meta Llama 3 성능 강화 Technical Blog - South Korea	1	364	May 3, 2024

NVIDIA TensorRT-LLM Revs Up Inference for Google Gemma

Related topics