Google's New Gemma 2 Model Now Optimized and Available on NVIDIA API Catalog

jwitsoe · July 1, 2024, 4:00pm

Originally published at: NVIDIA NIM | gemma-2-27b-it

Gemma 2, the next generation of Google Gemma models, is now optimized with TensorRT-LLM and packaged as NVIDIA NIM inference microservice.

aldobbarros · August 28, 2024, 7:03pm

That’s great news! Gemma 2 with TensorRT-LLM and NVIDIA NIM inference microservice sounds like a powerful combination for efficient LLM inference. I’m excited to see how this will improve performance and deployment options.

Topic		Replies	Views
NVIDIA TensorRT-LLM Revs Up Inference for Google Gemma Technical Blog	1	300	February 21, 2024
Gemma2 support in tensor RT LLM TensorRT llm , gemma-2-9b-it	0	178	July 24, 2024
Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX Announcements llama	0	175	June 30, 2025
Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX Announcements llama	0	184	July 3, 2025
Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX Technical Blog	1	160	June 26, 2025
Lightweight, Multimodal, Multilingual Gemma 3 Models Are Streamlined for Performance Technical Blog	1	75	March 12, 2025
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	2043	January 25, 2024
NVIDIA Jetson와 RTX에서 Google DeepMind의 Gemma 3n 실행하기 Technical Blog - South Korea	1	65	July 7, 2025
Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA TensorRT Technical Blog	4	1402	March 21, 2022
NVIDIA Collaborates with Hugging Face to Simplify Generative AI Model Deployments Technical Blog	1	203	June 3, 2024

Google's New Gemma 2 Model Now Optimized and Available on NVIDIA API Catalog

Related topics