Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints

jwitsoe · May 8, 2024, 4:51pm

Originally published at: Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints | NVIDIA Technical Blog

Retrieval-augmented generation (RAG) is a technique that combines information retrieval with a set of carefully designed system prompts to provide more accurate, up-to-date, and contextually relevant responses from large language models (LLMs). By incorporating data from various sources such as relational databases, unstructured document repositories, internet data streams, and media news feeds, RAG can significantly…

duenas.marta · May 30, 2024, 8:26am

Good morning,
I am trying RAG implementation.
I follow the instructions on your blog
I can’t connect to: from langchain_nvidia_ai_endpoints import ChatNVIDIA
I only get it as specified by “NVIDIA API catalog”, through from openai import OpenAI

If I make the invocation with ChatNVIDIA and the url they provide me in “NVIDIA API catalog”
from langchain_nvidia_ai_endpoints import ChatNVIDIA
llm = ChatNVIDIA(model=“meta/llama3-70b-instruct”, nvidia_api_key=nvapi_key)
llm.invoke(“What interfaces does Triton support?”)
print(result.content)
answer: “ValueError: Unknown model name meta/llama3-70b-instruct specified.” available models ai-llama3-70b - a88f115a-4a47-4381-ad62-ca25dc33dc1b …
If I make the invocation with model=“ai-llama3-70b”
llm = ChatNVIDIA(model=“ai-llama3-70b”, nvidia_api_key=nvapi_key)
answer: Exception: [404] Not Found \n The model gpt43b does not exist.

ambleiweiss · May 30, 2024, 12:27pm

Note the blog uses the model “ai-llama2-70b”

You can follow a full notebook version of the blog here: Build a RAG chain by generating embeddings for NVIDIA Triton documentation — NVIDIA Generative AI Examples 24.4.0 documentation

In order to see the supported models, call ChatNVIDIA.get_available_models()

ashish-kumar · July 24, 2024, 11:09am

Hi - I am facing issue with langchain using ChatNVIDIA with locally deployed LLM with NIM.

My Code:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
llm = ChatNVIDIA(
base_url=“https://ashish-mistral-nim-deploy-1-predictor.**********/v1/completions”,
model=“mistralai/mistral-7b-instruct-v0.3”
)
result = llm.invoke(“Write a ballad about LangChain.”)

This code is throwing below error
SSLError: HTTPSConnectionPool(host=‘ashish-mistral-nim-deploy-1-predictor.***********’, port=443): Max retries exceeded with url: /v1/completions/chat/completions (Caused by SSLError(SSLCertVerificationError(1, ‘[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)’)))

Am not sure how to apply the ssl certificate. Any guidaince here?

Following curl based api request works perfectly fine though.

curl --cacert test.crt -X ‘POST’ ‘https://ashish-mistral-nim-deploy-1-predictor.**********/v1/completions’ -H ‘accept: application/json’ -H ‘Content-Type: application/json’ -d ‘{“model”: “mistralai/mistral-7b-instruct-v0.3”,“prompt”: “Write a ballad about LangChain.”,“max_tokens”: 64}’

ambleiweiss · July 25, 2024, 11:30am

Hi Ashish,

When running in LangChain you need to remove “/completions” from the base_url

See this line in the notebook example: llm = ChatNVIDIA(base_url=“http://0.0.0.0:8000/v1”, model=“meta/llama3-8b-instruct”, temperature=0.1, max_tokens=1000, top_p=1.0)

Thanks,
Amit

ashish-kumar · July 25, 2024, 12:27pm

Hi - Thanks for response. Your hints did not help here.
Below API Call fails…
llm = ChatNVIDIA(
base_url=“https://ashish-mistral-nim-deploy-1-predictor.******/v1”,
model=“mistralai/mistral-7b-instruct-v0.3”
)

Error
SSLError: HTTPSConnectionPool(host=‘ashish-mistral-nim-deploy-1-*********’, port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLCertVerificationError(1, ‘[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)’)))

#Following piece of code works perfectly fine as well.
import requests

langchain_nvidia_mistral_api_url=“https://ashish-mistral-nim-deploy-1-*******/v1/completions”
tokenheaders={“Content-Type”:“application/json”, “accept”: “application/json”}
payload={“model”: “mistralai/mistral-7b-instruct-v0.3”,“prompt”: “Tell me something about Langchain.”,“max_tokens”: 64}
response = requests.post(langchain_nvidia_mistral_api_url, json=payload, headers=tokenheaders, verify=False)

print(response)
print(response.json())

How do i pass my ssl ca certificate to langchain api call? or how can i mention verify=False in NVIDIAChat API Call? Just adding verify=False does not work.

ashish-kumar · July 26, 2024, 9:06am

Please consider my open query as closed. Here is the code that works for me and am posting here for benefit of others as well

Create right chain of certificates
mention the certificate in the api call as below

os.environ[‘SSL_CERT_FILE’]=‘/usr/local/share/ca-certificates/chain.pem’
llm = ChatNVIDIA(
base_url=“https://ashish-mistral-nim-deploy-1-predictor.************/v1”,
model=“mistralai/mistral-7b-instruct-v0.3”,
verify=“/usr/local/share/ca-certificates/chain.pem”
)

ambleiweiss · August 28, 2024, 6:11am

Thank you Ashish for sharing this solution with the developer community!

Topic		Replies	Views
Building RAG Agents with LLMs stack with final test NVIDIA Nemotron llama3-8b-instruct , nv-embed-v1	2	213	March 10, 2025
Creating RAG-Based Question-and-Answer LLM Workflows at NVIDIA Technical Blog	1	109	October 28, 2024
Access RAG project endpoint via Python code NVIDIA AI Workbench python , agentic-ai	1	174	April 4, 2025
Building RAG Agents with LLMs assessment problems NVIDIA Nemotron llama3-8b-instruct , nv-embed-v1	2	255	June 6, 2025
Building rag agents with LLMs final assessment support NVIDIA Nemotron mixtral-8x22b-instruct-v01 , mixtral-8x7b-instruct-v01 , embed-qa-4	11	1120	August 11, 2024
Code assist and rag (instruct) in single node DGX Spark / GB10 Projects	2	340	February 14, 2026
Agentic RAG: Chat UI question DGX Spark / GB10 Projects agentic-ai	1	253	February 10, 2026
Chat with RTX .md support NVIDIA Nemotron	1	601	August 9, 2024
DLI Course ‘Building RAG Agents for LLMs’ - Assessment Support NVIDIA Nemotron dli	2	385	June 13, 2024
RAG solution using Mistral NVIDIA Nemotron mistral-7b-instruct-v02	0	99	July 25, 2024

Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints

Related topics