Dont understand how to finish - DLI Course ‘Building RAG Agents for LLMs’

Hi. @vkudlay

this is my code changes at notebook 35: i guess i missing something:
because the server is running but the assesment not completed

%%writefile server_app.py

🦜️🏓 LangServe | 🦜️🔗 LangChain

from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langserve import add_routes
from operator import itemgetter
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings
from langchain.document_transformers import LongContextReorder
from langchain_core.runnables import RunnableLambda, RunnableBranch
from langchain_core.runnables.passthrough import RunnableAssign

NVIDIAEmbeddings.get_available_models(base_url=“http://llm_client:9000/v1”)
llm = ChatNVIDIA(model=“mistralai/mixtral-8x7b-instruct-v0.1”)
from langchain_community.vectorstores import FAISS
embedder = NVIDIAEmbeddings(
model=“nvidia/embed-qa-4”, truncate=“END”,
base_url=“http://llm_client:9000/v1”
)

def output_puller(inputs):
“”““Output generator. Useful if your chain returns a dictionary with key ‘output’””"
if isinstance(inputs, dict):
inputs = [inputs]
for token in inputs:
if token.get(‘output’):
yield token.get(‘output’)

def docs2str(docs, title=“Document”):
“”“Useful utility for making chunks into context string. Optional, but useful”“”
out_str = “”
for doc in docs:
doc_name = getattr(doc, ‘metadata’, {}).get(‘Title’, title)
if doc_name: out_str += f"[Quote from {doc_name}] "
out_str += getattr(doc, ‘page_content’, str(doc)) + “\n”
return out_str

app = FastAPI(
title=“LangChain Server”,
version=“1.0”,
description=“A simple api server using Langchain’s Runnable interfaces”,
)

chat_prompt = ChatPromptTemplate.from_template(
“You are a document chatbot. Help the user as they ask questions about documents.”
" User messaged just asked you a question: {input}\n\n"
" The following information may be useful for your response: "
" Document Retrieval:\n{context}\n\n"
" (Answer only from retrieval. Only cite sources that are used. Make your response conversational)"
“\n\nUser Question: {input}”
)

add_routes(

app,

llm,

path=“/basic_chat”,

)

long_reorder = RunnableLambda(LongContextReorder().transform_documents)
docstore = FAISS.load_local(“docstore_index”, embedder, allow_dangerous_deserialization=True)
add_routes(app, llm, path=“/basic_chat”,)
context_getter = itemgetter(‘input’) | docstore.as_retriever() | long_reorder | docs2str
retrieval_chain = {‘input’ : (lambda x: x)} | RunnableAssign({‘context’ : context_getter})

generator_chain = chat_prompt | llm
add_routes(app, retrieval_chain, path=“/retriever”,)
generator_chain = chat_prompt | llm ## TODO
generator_chain = {“output” : generator_chain } | RunnableLambda(output_puller) ## GIVEN

add_routes(app, generator_chain, path=“/generator”,)

Might be encountered if this were for a standalone python file…

if name == “main”:
import uvicorn
uvicorn.run(app, host=“0.0.0.0”, port=9012)

Nice! Getting closer! In order to properly synergize, take a look at the actual server code (I believe server_blocks.py) to see what the RemoteRunnables are actually doing already. You’ll see that your code is running into issues like {‘input’: {‘input’: …}} etc. This is also discussed at the end of Notebook 3.

Hello, have you managed to solve your problem? And I think its kinda abnormal, that in this course we need to deal with such problematic things, course is about RAG and langchain, why is so?

Not yet

Hi @vkudlay thanks for your response
but it still unclear to me (i guess all)
anyway i didnt find sercer_blocks
there is fronted_blocks
but as i mentioned not sure what to do

Hi @vkudlay can u assist?
its not clear how to proceed

can u assist @tdahlin @vkudlay @TomNVIDIA

it still unclear to me (i guess all)
anyway i didnt find sercer_blocks
there is fronted_blocks
but as i mentioned not sure what to do

Hey @user157430. Yup, that’s the one.

From notebook 3, this is the code that is actually using your endpoints, which are running in the server:

## Necessary Endpoints
chains_dict = {
    'basic' : RemoteRunnable("http://lab:9012/basic_chat/"),
    'retriever' : RemoteRunnable("http://lab:9012/retriever/"),  ## For the final assessment
    'generator' : RemoteRunnable("http://lab:9012/generator/"),  ## For the final assessment
}

basic_chain = chains_dict['basic']

## Retrieval-Augmented Generation Chain

retrieval_chain = (
    {'input' : (lambda x: x)}
    | RunnableAssign(
        {'context' : itemgetter('input') 
        | chains_dict['retriever'] 
        | LongContextReorder().transform_documents
        | docs2str
    })
)

output_chain = RunnableAssign({"output" : chains_dict['generator'] }) | output_puller
rag_chain = retrieval_chain | output_chain

Your current pipeline effectively functions like the following (same with the retrieval chain):

output_chain = RunnableAssign(
      ## What is being used:
      # {"output" : chains_dict['generator'] }
      ## What it should be doing:
      # {"output": chat_prompt | llm} 
      ## What your implementation is doing:
      {"output": {“output” : chat_prompt | llm} | RunnableLambda(output_puller)}}
) | output_puller

@kolesnyk.am Haha yeah, I’ll be the first to admit that this course is much more of an LLM orchestration course than a pure RAG course. Coincidentally, we did just release a more standard RAG-as-first-class-focus course Techniques for Improving the Effectiveness of RAG Systems | NVIDIA.

That other course is more heavily focused on doing more and more RAG stuff, whereas this course is more focused on contextualizing RAG as just a small option among many others that can all coexist in an application (and then going for RAG as the final project since it wraps things up nicely without getting too niche).

Hi @vkudlay can u assist?
its not clear how to proceed
should i use this code?

Help me with this

Hi @vkudlay can u assist?
its not clear how to proceed
should i use this code?
[/quote]

Hey @user157430. Unfortunately, I can’t get any closer to the solution without explicitly posting it (which I do not want to do). Please help me by explaining what issues you’re running into, and I can try to rephrase my message to address concerns.

hi @vkudlay
its just not clear
should i use this code?
it is not clear what and how to complete this task

@user157430 Your solution above (here) is close:

retrieval_chain = {‘input’ : (lambda x: x)} | RunnableAssign({‘context’ : context_getter})
add_routes(app, retrieval_chain, path=“/retriever”,)   ## goes into chains_dict['retriever'] 

generator_chain = chat_prompt | llm
generator_chain = {“output” : generator_chain } | RunnableLambda(output_puller)
add_routes(app, generator_chain, path=“/generator”)  ## goes into chains_dict['generator'] 

It is encountering problems because it does not work with the frontend server, which is doing this:

## Necessary Endpoints
chains_dict = {
    'basic' : RemoteRunnable("http://lab:9012/basic_chat/"),
    'retriever' : RemoteRunnable("http://lab:9012/retriever/"),  ## For the final assessment
    'generator' : RemoteRunnable("http://lab:9012/generator/"),  ## For the final assessment
}

basic_chain = chains_dict['basic']

## Retrieval-Augmented Generation Chain

retrieval_chain = (
    {'input' : (lambda x: x)}
    | RunnableAssign(
        {'context' : itemgetter('input') 
        | chains_dict['retriever'] 
        | LongContextReorder().transform_documents
        | docs2str
    })
)

output_chain = RunnableAssign({"output" : chains_dict['generator'] }) | output_puller
rag_chain = retrieval_chain | output_chain

Because of how your solution is implemented, the frontend’s output chain is functioning like this since it is using your generator route:

output_chain = RunnableAssign({"output": {“output” : chat_prompt | llm} | RunnableLambda(output_puller)}}) | output_puller
## Recall: generator_chain = {“output” : chat_prompt | llm } | RunnableLambda(output_puller)

even though it should be functioning like this:

output_chain = RunnableAssign({"output": chat_prompt | llm}) | output_puller

The retriever chain is also functioning in a similar capacity.