Building rag agents with LLMs final assessment support

auzuha · July 27, 2024, 12:22pm

I am unable to get the assessment running in the gradio notebook, below is my server_app.py file, please help. Also im almost out of the server times of 32 hours
@vkudlay
PS sorry for the bad code

%%writefile server_app.py
# https://python.langchain.com/docs/langserve#server
from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langserve import add_routes

llm = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")






from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda, RunnableBranch
from langchain_core.runnables.passthrough import RunnableAssign
from langchain.document_transformers import LongContextReorder

from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings

from functools import partial
from operator import itemgetter

import gradio as gr
from langchain_community.vectorstores import FAISS
embedder = NVIDIAEmbeddings(
    model="nvidia/embed-qa-4", truncate="END",
    base_url="http://llm_client:9000/v1"
)

docstore = FAISS.load_local("docstore_index", embedder, allow_dangerous_deserialization=True)
docs = list(docstore.docstore._dict.values())
#####################################################################

# NVIDIAEmbeddings.get_available_models(base_url="http://llm_client:9000/v1")


# ChatNVIDIA.get_available_models(base_url="http://llm_client:9000/v1")
instruct_llm = ChatNVIDIA(
    model="mistralai/mixtral-8x22b-instruct-v0.1",
    base_url="http://llm_client:9000/v1"
)
llm = instruct_llm | StrOutputParser()

#####################################################################

def docs2str(docs, title="Document"):
    """Useful utility for making chunks into context string. Optional, but useful"""
    out_str = ""
    for doc in docs:
        doc_name = getattr(doc, 'metadata', {}).get('Title', title)
        if doc_name: out_str += f"[Quote from {doc_name}] "
        out_str += getattr(doc, 'page_content', str(doc)) + "\n"
    return out_str

chat_prompt = ChatPromptTemplate.from_template(
    "You are a document chatbot. Help the user as they ask questions about documents."
    " User messaged just asked you a question: {input}\n\n"
    " The following information may be useful for your response: "
    " Document Retrieval:\n{context}\n\n"
    " (Answer only from retrieval. Only cite sources that are used. Make your response conversational)"
    "\n\nUser Question: {input}"
)

def output_puller(inputs):
    """"Output generator. Useful if your chain returns a dictionary with key 'output'"""
    print('inputs', inputs)
    yield inputs
    '''if isinstance(inputs, dict):
        inputs = [inputs]
    for token in inputs:
        if token.get('output'):
            yield token.get('output')
        else:
            yield ""'''

#####################################################################
## TODO: Pull in your desired RAG Chain. Memory not necessary

## Chain 1 Specs: "Hello World" -> retrieval_chain 
##   -> {'input': <str>, 'context' : <str>}
long_reorder = RunnableLambda(LongContextReorder().transform_documents)  ## GIVEN
##context_getter = itemgetter('input') | docstore.as_retriever() | long_reorder | docs2str  ## TODO
retrieval_chain = RunnableLambda(lambda x: docstore.as_retriever())
##retrieval_chain = {'input' : (lambda x: x)} | RunnableAssign({'context' : context_getter})

## Chain 2 Specs: retrieval_chain -> generator_chain 
##   -> {"output" : <str>, ...} -> output_puller
generator_chain = chat_prompt | llm ## TODO
#generator_chain = RunnableAssign({'output' : generator_chain})# | output_puller  ## GIVEN

## END TODO
#####################################################################

rag_chain = retrieval_chain | generator_chain


app = FastAPI(
  title="LangChain Server",
  version="1.0",
  description="A simple api server using Langchain's Runnable interfaces",
)

add_routes(
    app,
    instruct_llm,
    path="/basic_chat",
)

add_routes(
    app,
    generator_chain,
    path="/generator",
)

add_routes(
    app,
    retrieval_chain,
    path="/retriever",
)

## Might be encountered if this were for a standalone python file...
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=9012)

The individual chains seem to work as expected when running using RemoteRunnable and streaming them.

auzuha · August 9, 2024, 9:23am

@vkudlay Hey, if you’re not too busy could you give me some assistance

vkudlay · August 9, 2024, 5:25pm

Hey @auzuha! Sorry about the delay; SIGGRAPH made my dms a blackhole for around a week. This looks very close! The first thing I see is the line above, which is returning the docstore retriever object instead of returning the retrieval from said docstore. Remove the RunnableLambda(lambda x: ) part and the retriever will return the retrieval… instead of itself.

auzuha · August 10, 2024, 1:31pm

Hi, @vkudlay. I see. Thanks for the tip, I have exhausted my 32hr runtime limit on this course. Is it possible to allocate a bit more time on it? If it is could you help me on this?

vkudlay · August 11, 2024, 6:13pm

Sure! Added some more time!

auzuha · August 11, 2024, 7:48pm

Hi @vkudlay
Thanks for adding the time, I tried some changes and unfortunately I’m still stuck at one point.

Since docstore is a FAISS index loaded from local storage,

docstore = FAISS.load_local("docstore_index", embedder, allow_dangerous_deserialization=True)

I made the retrieval chain like this:

retrieval_chain = RunnableLambda(lambda x: docstore.similarity_search(x))

However, when I run the server app python file, and check the retriever route using langserve remote runnable:

from langserve import RemoteRunnable
from langchain_core.output_parsers import StrOutputParser

llm = RemoteRunnable("http://0.0.0.0:9012/retriever/") | StrOutputParser()
for token in llm.stream("Hello World! How is it going?"):
    print(token, end='')

I’m getting this error:

Exception: [404] Not Found
    | Inference error
    | RequestID: d3762ab8-7824-4de1-9134-6845959ba348

I’m not sure what is causing this error

vkudlay · August 11, 2024, 7:51pm

Have you tried just sending back as_retriever (no runnablelambda or anything like that?)

auzuha · August 11, 2024, 7:55pm

Yes,

retrieval_chain = docstore.as_retriever()

Gives me

aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url=URL('http://llm_client:9000/v1/chat/completions')

vkudlay · August 11, 2024, 7:58pm

Oh, odd. Hate to ask, but can you try not using 8x22b (maybe try llama-3.1 or 8x7b). This sounds like a model input glitch I’ve heard other people bring up. TL;DR some documents mess with some weird 8x22b restrictions.

auzuha · August 11, 2024, 8:03pm

This sure is weird, once I switched to other model it seems to work

vkudlay · August 11, 2024, 8:06pm

Yeah, I need to look into that detail. It’s on my plate, but I’ve been having a hard time replicating the issue and need to dedicate some time to it. But yeah, sorry about that. Do let me know if something else messes up or if you have any questions

auzuha · August 11, 2024, 9:09pm

It looks like the model is responding with
‘I’m ready to help. However, I don’t see the actual question in your message. Please go ahead and share the question, and I’ll do my best to answer it based on the provided documents.’
most of the times Im sure I havent changed the chat prompt at all
here it is:

chat_prompt = ChatPromptTemplate.from_template(
    "You are a document chatbot. Help the user as they ask questions about documents."
    " User messaged just asked you a question: {input}\n\n"
    " The following information may be useful for your response: "
    " Document Retrieval:\n{context}\n\n"
    " (Answer only from retrieval. Only cite sources that are used. Make your response conversational)"
    "\n\nUser Question: {input}"
)

Topic		Replies	Views
DLI Course 'Building RAG Agents for LLMs' - Assessment Support Storage dli	36	3072	May 23, 2025
DLI Course "Building RAG Agents for LLMs" - Help With Assessment AI Foundation Models and Endpoints dli	4	1370	September 5, 2024
DLI Course "Building RAG Agents for LLM" Need Help With RAG branch for Assessment AI Foundation Models and Endpoints dli , generative_ai	3	794	July 5, 2024
DLI Building RAG Agents with LLMs - Unable to Configure Fast API AI Foundation Models and Endpoints mixtral-8x22b-instruct-v01 , nv-embed-v1	4	63	April 21, 2025
Langserve problem in Assessment, "Building RAG agents with LLMs" Forum Feedback dli , llama3-8b-instruct , nv-embed-v1	2	210	February 4, 2025
Dont understand how to finish - DLI Course ‘Building RAG Agents for LLMs’ Base Command Manager	38	1631	May 3, 2025
Building RAG Agents with LLMs stack with final test AI Foundation Models and Endpoints llama3-8b-instruct , nv-embed-v1	2	66	March 10, 2025
Building RAG Agents for LLMS - Assessment Support - ValueError AI Foundation Models and Endpoints ai , llm , llama	6	236	November 27, 2024
Part 5 Assessment in DLI Course ‘Building RAG Agents for LLMs’ AI Foundation Models and Endpoints dli	11	1995	February 21, 2024
DLI - Building RAG Agents with LLMs AI Foundation Models and Endpoints	25	1084	December 13, 2024

Building rag agents with LLMs final assessment support

Related topics