Models with vlm, structured output and tool_calling

daniel.morais1 · August 25, 2025, 6:29pm

Hi, I am building an agentic application that requires an LLM that accepts image and text as input, has support to tool calling and structured response. Since I am using langchain chatnvidia, I used this snippet to verify models that fit this criteria:

from langchain_nvidia_ai_endpoints import ChatNVIDIA

models = [model.id for model in ChatNVIDIA.get_available_models() if model.model_type == 'vlm' and model.supports_tools and model.supports_structured_output]

print(models)

This gives me an empty list. Is it correct? Don’t we have any models that support these three features?

Also, I saw that some other providers support with_structure_output using function calling instead of json_mode or structured_output. Is this also possible with ChatNVIDIA?

sfdcmergetest1 · September 4, 2025, 5:46pm

NVIDIA also supports multimodal inputs, meaning you can provide both images and text for the model to reason over. An example model supporting multimodal inputs is nvidia/neva-22b.

https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/#multimodal

daniel.morais1 · September 9, 2025, 1:24pm

Hello! Thank you for your answer, I took a look on this model and it does not seem match my needs as it does not support tool calling. here is a simple reproducible script that yields an error due to the lack of tools support.

from langchain_core.tools import tool
from langchain_nvidia_ai_endpoints import ChatNVIDIA

@tool
def get_weather(city: str) -> str:
    """Get the weather for a given city."""
    return f"The weather in {city} is sunny."

llm = ChatNVIDIA(model="nvidia/neva-22b", api_key="<your_api_key>").bind_tools([get_weather])

llm.invoke("What is the weather in Tokyo?")

As I mentioned I needed a model that supports tool calling, is multimodal and support structured_response.

Topic		Replies	Views
Open AI Endpoint NVIDIA Nemotron	0	251	April 28, 2024
Build AI model from scratch Deep Learning (Training & Inference)	0	311	November 9, 2020
Returning only response without explanation NVIDIA Nemotron	2	485	January 4, 2024
Using pre-trained models other than ones provided in nvtltea/iva TAO Toolkit	2	611	October 12, 2021
Building AI Agents with NVIDIA NIM Microservices and LangChain Technical Blog nim , llama	1	96	August 7, 2024
Building a Simple VLM-Based Multimodal Information Retrieval System with NVIDIA NIM Technical Blog nim	2	67	February 26, 2025
LLM based Multimodal AI w/ Azure Open AI & NVIDIA Jetson Jetson Projects	0	547	August 22, 2023
The model llama3 does not exist calling from ChatNVIDIA langchain class NVIDIA Nemotron	2	600	May 6, 2024
How can we bring VLM of choice? Visual AI Agent	2	154	August 23, 2024
Hope, dream NVIDIA Nemotron	0	236	February 29, 2024

Models with vlm, structured output and tool_calling

Related topics