Google PaliGemma

Dive into the exciting world where images and text fuse with Google’s VLM, Paligemma!

Paligemma is a versatile model engineered to seamlessly blend the power of images and text, making it your go-to buddy for tasks like:

Image captioning: Turn those visuals into captivating stories with PaliGemma’s knack for generating descriptive captions.

Visual question answering: Got burning questions about what you see? PaliGemma’s got your back, providing insightful answers based on the images you throw its way.

Text reading: Whether it’s signs, labels, or handwritten notes, PaliGemma is here to decipher and make sense of all that text within images.

Object detection and segmentation: Spotting objects in images is a breeze with PaliGemma’s sharp eye for detail. Say goodbye to playing “Where’s Waldo?”

Where’s PaliGemma?
NVIDIA collaborated w/ Google to optimize the model and it is now available on the NVIDIA API catalog.

Try Now