Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints

Originally published at: Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints | NVIDIA Technical Blog

Retrieval-augmented generation (RAG) is a technique that combines information retrieval with a set of carefully designed system prompts to provide more accurate, up-to-date, and contextually relevant responses from large language models (LLMs). By incorporating data from various sources such as relational databases, unstructured document repositories, internet data streams, and media news feeds, RAG can significantly


1 Like

Good morning,
I am trying RAG implementation.
I follow the instructions on your blog
I can’t connect to: from langchain_nvidia_ai_endpoints import ChatNVIDIA
I only get it as specified by “NVIDIA API catalog”, through from openai import OpenAI

If I make the invocation with ChatNVIDIA and the url they provide me in “NVIDIA API catalog”
from langchain_nvidia_ai_endpoints import ChatNVIDIA
llm = ChatNVIDIA(model=“meta/llama3-70b-instruct”, nvidia_api_key=nvapi_key)
llm.invoke(“What interfaces does Triton support?”)
print(result.content)
answer: “ValueError: Unknown model name meta/llama3-70b-instruct specified.” available models ai-llama3-70b - a88f115a-4a47-4381-ad62-ca25dc33dc1b 

If I make the invocation with model=“ai-llama3-70b”
llm = ChatNVIDIA(model=“ai-llama3-70b”, nvidia_api_key=nvapi_key)
answer: Exception: [404] Not Found \n The model gpt43b does not exist.

Note the blog uses the model “ai-llama2-70b”

You can follow a full notebook version of the blog here: Build a RAG chain by generating embeddings for NVIDIA Triton documentation — NVIDIA Generative AI Examples 24.4.0 documentation

In order to see the supported models, call ChatNVIDIA.get_available_models()

1 Like

Hi - I am facing issue with langchain using ChatNVIDIA with locally deployed LLM with NIM.

My Code:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
llm = ChatNVIDIA(
base_url=“https://ashish-mistral-nim-deploy-1-predictor.**********/v1/completions”,
model=“mistralai/mistral-7b-instruct-v0.3”
)
result = llm.invoke(“Write a ballad about LangChain.”)

This code is throwing below error
SSLError: HTTPSConnectionPool(host=‘ashish-mistral-nim-deploy-1-predictor.***********’, port=443): Max retries exceeded with url: /v1/completions/chat/completions (Caused by SSLError(SSLCertVerificationError(1, ‘[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)’)))

Am not sure how to apply the ssl certificate. Any guidaince here?

Following curl based api request works perfectly fine though.

curl --cacert test.crt -X ‘POST’ ‘https://ashish-mistral-nim-deploy-1-predictor.**********/v1/completions’ -H ‘accept: application/json’ -H ‘Content-Type: application/json’ -d ‘{“model”: “mistralai/mistral-7b-instruct-v0.3”,“prompt”: “Write a ballad about LangChain.”,“max_tokens”: 64}’

Hi Ashish,

When running in LangChain you need to remove “/completions” from the base_url

See this line in the notebook example: llm = ChatNVIDIA(base_url=“http://0.0.0.0:8000/v1”, model=“meta/llama3-8b-instruct”, temperature=0.1, max_tokens=1000, top_p=1.0)

Thanks,
Amit

Hi - Thanks for response. Your hints did not help here.
Below API Call fails

llm = ChatNVIDIA(
base_url=“https://ashish-mistral-nim-deploy-1-predictor.******/v1”,
model=“mistralai/mistral-7b-instruct-v0.3”
)

Error
SSLError: HTTPSConnectionPool(host=‘ashish-mistral-nim-deploy-1-*********’, port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLCertVerificationError(1, ‘[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)’)))

#Following piece of code works perfectly fine as well.
import requests

langchain_nvidia_mistral_api_url=“https://ashish-mistral-nim-deploy-1-*******/v1/completions”
tokenheaders={“Content-Type”:“application/json”, “accept”: “application/json”}
payload={“model”: “mistralai/mistral-7b-instruct-v0.3”,“prompt”: “Tell me something about Langchain.”,“max_tokens”: 64}
response = requests.post(langchain_nvidia_mistral_api_url, json=payload, headers=tokenheaders, verify=False)

print(response)
print(response.json())

How do i pass my ssl ca certificate to langchain api call? or how can i mention verify=False in NVIDIAChat API Call? Just adding verify=False does not work.

Please consider my open query as closed. Here is the code that works for me and am posting here for benefit of others as well

  1. Create right chain of certificates
  2. mention the certificate in the api call as below

os.environ[‘SSL_CERT_FILE’]=‘/usr/local/share/ca-certificates/chain.pem’
llm = ChatNVIDIA(
base_url=“https://ashish-mistral-nim-deploy-1-predictor.************/v1”,
model=“mistralai/mistral-7b-instruct-v0.3”,
verify=“/usr/local/share/ca-certificates/chain.pem”
)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Thank you Ashish for sharing this solution with the developer community!