Langserve problem in Assessment, "Building RAG agents with LLMs"

Hi guys. I was trying to finish the course Building RAG with LLMs, but i got some problems. At 08_evaluation.ipynb, i open the Gradio Frontend but when i click Evaluate i recieve the following error:

Generating Synthetic QA Pair:
...
Gradio Stream failed: [Errno 111] Connection refused

Metric score of 0.0, while 0.60 is required

Also, in 09_langserve.ipynb i was trying to solve this. The server was running in this notebook, and i created another file to test some requests. The following code was on 09_langserve:

%%writefile server_app.py
# https://python.langchain.com/docs/langserve#server
from fastapi import FastAPI
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings
from langserve import add_routes, RemoteRunnable

## May be useful later
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.prompt_values import ChatPromptValue
from langchain_core.runnables import RunnableLambda, RunnableBranch, RunnablePassthrough
from langchain_core.runnables.passthrough import RunnableAssign
from langchain_community.document_transformers import LongContextReorder
from functools import partial
from operator import itemgetter

from langchain_community.vectorstores import FAISS

## TODO: Make sure to pick your LLM and do your prompt engineering as necessary for the final assessment
embedder = NVIDIAEmbeddings(model="nvidia/nv-embed-v1", truncate="END")
instruct_llm = ChatNVIDIA(model="meta/llama3-8b-instruct")

llm = instruct_llm | StrOutputParser()

docstore = FAISS.load_local("docstore_index", embedder, allow_dangerous_deserialization=True)
docs = list(docstore.docstore._dict.values())

def docs2str(docs, title="Document"):
    """Useful utility for making chunks into context string. Optional, but useful"""
    out_str = ""
    for doc in docs:
        doc_name = getattr(doc, 'metadata', {}).get('Title', title)
        if doc_name: out_str += f"[Quote from {doc_name}] "
        out_str += getattr(doc, 'page_content', str(doc)) + "\n"
    return out_str

chat_prompt = ChatPromptTemplate.from_template(
    "You are a document chatbot. Help the user as they ask questions about documents."
    " User messaged just asked you a question: {input}\n\n"
    " The following information may be useful for your response: "
    " Document Retrieval:\n{context}\n\n"
    " (Answer only from retrieval. Only cite sources that are used. Make your response conversational)"
    "\n\nUser Question: {input}"
)

def output_puller(inputs):
    """"Output generator. Useful if your chain returns a dictionary with key 'output'"""
    if isinstance(inputs, dict):
        inputs = [inputs]
    for token in inputs:
        if token.get('output'):
            yield token.get('output')

chains_dict = {
    'basic' : RemoteRunnable("http://lab:9012/basic_chat/"),
    'retriever' : RemoteRunnable("http://lab:9012/retriever/"),  ## For the final assessment
    'generator' : RemoteRunnable("http://lab:9012/generator/"),  ## For the final assessment
}

basic_chain = chains_dict['basic']

## Retrieval-Augmented Generation Chain

retrieval_chain = (
    {'input' : (lambda x: x)}
    | RunnableAssign(
        {'context' : itemgetter('input') 
        | chains_dict['retriever'] 
        | LongContextReorder().transform_documents
        | docs2str
    })
)

output_chain = RunnableAssign({"output" : chains_dict['generator'] }) | output_puller
rag_chain = retrieval_chain | output_chain

app = FastAPI(
  title="LangChain Server",
  version="1.0",
  description="A simple api server using Langchain's Runnable interfaces",
)

## PRE-ASSESSMENT: Run as-is and see the basic chain in action

add_routes(
    app,
    llm,
    path="/basic_chat",
)

## ASSESSMENT TODO: Implement these components as appropriate

add_routes(
    app,
    retrieval_chain,
    path="/generator",
)

add_routes(
    app,
    output_chain,
    path="/retriever",
)

## Might be encountered if this were for a standalone python file...
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=9012)

And the following one is the “test code”:

from langserve import RemoteRunnable
from langchain_core.output_parsers import StrOutputParser

llm = RemoteRunnable("http://0.0.0.0:9012/basic_chat/") | StrOutputParser()
for token in llm.stream("Hello World! How is it going?"):
    print(token, end='')

retrieval_llm = RemoteRunnable("http://0.0.0.0:9012/retriever/") | StrOutputParser()
response = retrieval_llm.invoke({"input": "What is the topic of the document?"})
print(response)

generator_llm = RemoteRunnable("http://0.0.0.0:9012/generator/") | StrOutputParser()
response = generator_llm.invoke({"input": "Hello World! How is it going?"})
print(response)

The problem in 09_langserve.ipynb is, it only works if i use llm in server code. With basic_chain, retrieval_chain, output_chain, or rag_chain, i run the respective cell but it keeps running forever, nothing is printed out, nothing is shown at the server terminal.

I tried to look at others post in this forum, but it didnt help. I appreciate if you help me in these notebooks. If you’d like to see the complete code in case there’s something wrong in the rest of the code, i created a repository with the full code.

Hello,

Welcome to the forums! I have forwarded your issue to the DLI team for investigation. Please note that response times could be much longer than usual during the holidays.

Tom