Langserve problem in Assessment, "Building RAG agents with LLMs"

Hi guys. I was trying to finish the course Building RAG with LLMs, but i got some problems. At 08_evaluation.ipynb, i open the Gradio Frontend but when i click Evaluate i recieve the following error:

Generating Synthetic QA Pair:
...
Gradio Stream failed: [Errno 111] Connection refused

Metric score of 0.0, while 0.60 is required

Also, in 09_langserve.ipynb i was trying to solve this. The server was running in this notebook, and i created another file to test some requests. The following code was on 09_langserve:

%%writefile server_app.py
# https://python.langchain.com/docs/langserve#server
from fastapi import FastAPI
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings
from langserve import add_routes, RemoteRunnable

## May be useful later
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.prompt_values import ChatPromptValue
from langchain_core.runnables import RunnableLambda, RunnableBranch, RunnablePassthrough
from langchain_core.runnables.passthrough import RunnableAssign
from langchain_community.document_transformers import LongContextReorder
from functools import partial
from operator import itemgetter

from langchain_community.vectorstores import FAISS

## TODO: Make sure to pick your LLM and do your prompt engineering as necessary for the final assessment
embedder = NVIDIAEmbeddings(model="nvidia/nv-embed-v1", truncate="END")
instruct_llm = ChatNVIDIA(model="meta/llama3-8b-instruct")

llm = instruct_llm | StrOutputParser()

docstore = FAISS.load_local("docstore_index", embedder, allow_dangerous_deserialization=True)
docs = list(docstore.docstore._dict.values())

def docs2str(docs, title="Document"):
    """Useful utility for making chunks into context string. Optional, but useful"""
    out_str = ""
    for doc in docs:
        doc_name = getattr(doc, 'metadata', {}).get('Title', title)
        if doc_name: out_str += f"[Quote from {doc_name}] "
        out_str += getattr(doc, 'page_content', str(doc)) + "\n"
    return out_str

chat_prompt = ChatPromptTemplate.from_template(
    "You are a document chatbot. Help the user as they ask questions about documents."
    " User messaged just asked you a question: {input}\n\n"
    " The following information may be useful for your response: "
    " Document Retrieval:\n{context}\n\n"
    " (Answer only from retrieval. Only cite sources that are used. Make your response conversational)"
    "\n\nUser Question: {input}"
)

def output_puller(inputs):
    """"Output generator. Useful if your chain returns a dictionary with key 'output'"""
    if isinstance(inputs, dict):
        inputs = [inputs]
    for token in inputs:
        if token.get('output'):
            yield token.get('output')

chains_dict = {
    'basic' : RemoteRunnable("http://lab:9012/basic_chat/"),
    'retriever' : RemoteRunnable("http://lab:9012/retriever/"),  ## For the final assessment
    'generator' : RemoteRunnable("http://lab:9012/generator/"),  ## For the final assessment
}

basic_chain = chains_dict['basic']

## Retrieval-Augmented Generation Chain

retrieval_chain = (
    {'input' : (lambda x: x)}
    | RunnableAssign(
        {'context' : itemgetter('input') 
        | chains_dict['retriever'] 
        | LongContextReorder().transform_documents
        | docs2str
    })
)

output_chain = RunnableAssign({"output" : chains_dict['generator'] }) | output_puller
rag_chain = retrieval_chain | output_chain

app = FastAPI(
  title="LangChain Server",
  version="1.0",
  description="A simple api server using Langchain's Runnable interfaces",
)

## PRE-ASSESSMENT: Run as-is and see the basic chain in action

add_routes(
    app,
    llm,
    path="/basic_chat",
)

## ASSESSMENT TODO: Implement these components as appropriate

add_routes(
    app,
    retrieval_chain,
    path="/generator",
)

add_routes(
    app,
    output_chain,
    path="/retriever",
)

## Might be encountered if this were for a standalone python file...
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=9012)

And the following one is the ā€œtest codeā€:

from langserve import RemoteRunnable
from langchain_core.output_parsers import StrOutputParser

llm = RemoteRunnable("http://0.0.0.0:9012/basic_chat/") | StrOutputParser()
for token in llm.stream("Hello World! How is it going?"):
    print(token, end='')

retrieval_llm = RemoteRunnable("http://0.0.0.0:9012/retriever/") | StrOutputParser()
response = retrieval_llm.invoke({"input": "What is the topic of the document?"})
print(response)

generator_llm = RemoteRunnable("http://0.0.0.0:9012/generator/") | StrOutputParser()
response = generator_llm.invoke({"input": "Hello World! How is it going?"})
print(response)

The problem in 09_langserve.ipynb is, it only works if i use llm in server code. With basic_chain, retrieval_chain, output_chain, or rag_chain, i run the respective cell but it keeps running forever, nothing is printed out, nothing is shown at the server terminal.

I tried to look at others post in this forum, but it didnt help. I appreciate if you help me in these notebooks. If you’d like to see the complete code in case there’s something wrong in the rest of the code, i created a repository with the full code.

1 Like

Hello,

Welcome to the forums! I have forwarded your issue to the DLI team for investigation. Please note that response times could be much longer than usual during the holidays.

Tom

Did you get the problem resolved?