Hi guys. I was trying to finish the course Building RAG with LLMs, but i got some problems. At 08_evaluation.ipynb
, i open the Gradio Frontend but when i click Evaluate i recieve the following error:
Generating Synthetic QA Pair:
...
Gradio Stream failed: [Errno 111] Connection refused
Metric score of 0.0, while 0.60 is required
Also, in 09_langserve.ipynb
i was trying to solve this. The server was running in this notebook, and i created another file to test some requests. The following code was on 09_langserve:
%%writefile server_app.py
# https://python.langchain.com/docs/langserve#server
from fastapi import FastAPI
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings
from langserve import add_routes, RemoteRunnable
## May be useful later
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.prompt_values import ChatPromptValue
from langchain_core.runnables import RunnableLambda, RunnableBranch, RunnablePassthrough
from langchain_core.runnables.passthrough import RunnableAssign
from langchain_community.document_transformers import LongContextReorder
from functools import partial
from operator import itemgetter
from langchain_community.vectorstores import FAISS
## TODO: Make sure to pick your LLM and do your prompt engineering as necessary for the final assessment
embedder = NVIDIAEmbeddings(model="nvidia/nv-embed-v1", truncate="END")
instruct_llm = ChatNVIDIA(model="meta/llama3-8b-instruct")
llm = instruct_llm | StrOutputParser()
docstore = FAISS.load_local("docstore_index", embedder, allow_dangerous_deserialization=True)
docs = list(docstore.docstore._dict.values())
def docs2str(docs, title="Document"):
"""Useful utility for making chunks into context string. Optional, but useful"""
out_str = ""
for doc in docs:
doc_name = getattr(doc, 'metadata', {}).get('Title', title)
if doc_name: out_str += f"[Quote from {doc_name}] "
out_str += getattr(doc, 'page_content', str(doc)) + "\n"
return out_str
chat_prompt = ChatPromptTemplate.from_template(
"You are a document chatbot. Help the user as they ask questions about documents."
" User messaged just asked you a question: {input}\n\n"
" The following information may be useful for your response: "
" Document Retrieval:\n{context}\n\n"
" (Answer only from retrieval. Only cite sources that are used. Make your response conversational)"
"\n\nUser Question: {input}"
)
def output_puller(inputs):
""""Output generator. Useful if your chain returns a dictionary with key 'output'"""
if isinstance(inputs, dict):
inputs = [inputs]
for token in inputs:
if token.get('output'):
yield token.get('output')
chains_dict = {
'basic' : RemoteRunnable("http://lab:9012/basic_chat/"),
'retriever' : RemoteRunnable("http://lab:9012/retriever/"), ## For the final assessment
'generator' : RemoteRunnable("http://lab:9012/generator/"), ## For the final assessment
}
basic_chain = chains_dict['basic']
## Retrieval-Augmented Generation Chain
retrieval_chain = (
{'input' : (lambda x: x)}
| RunnableAssign(
{'context' : itemgetter('input')
| chains_dict['retriever']
| LongContextReorder().transform_documents
| docs2str
})
)
output_chain = RunnableAssign({"output" : chains_dict['generator'] }) | output_puller
rag_chain = retrieval_chain | output_chain
app = FastAPI(
title="LangChain Server",
version="1.0",
description="A simple api server using Langchain's Runnable interfaces",
)
## PRE-ASSESSMENT: Run as-is and see the basic chain in action
add_routes(
app,
llm,
path="/basic_chat",
)
## ASSESSMENT TODO: Implement these components as appropriate
add_routes(
app,
retrieval_chain,
path="/generator",
)
add_routes(
app,
output_chain,
path="/retriever",
)
## Might be encountered if this were for a standalone python file...
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=9012)
And the following one is the “test code”:
from langserve import RemoteRunnable
from langchain_core.output_parsers import StrOutputParser
llm = RemoteRunnable("http://0.0.0.0:9012/basic_chat/") | StrOutputParser()
for token in llm.stream("Hello World! How is it going?"):
print(token, end='')
retrieval_llm = RemoteRunnable("http://0.0.0.0:9012/retriever/") | StrOutputParser()
response = retrieval_llm.invoke({"input": "What is the topic of the document?"})
print(response)
generator_llm = RemoteRunnable("http://0.0.0.0:9012/generator/") | StrOutputParser()
response = generator_llm.invoke({"input": "Hello World! How is it going?"})
print(response)
The problem in 09_langserve.ipynb
is, it only works if i use llm in server code. With basic_chain, retrieval_chain, output_chain, or rag_chain, i run the respective cell but it keeps running forever, nothing is printed out, nothing is shown at the server terminal.
I tried to look at others post in this forum, but it didnt help. I appreciate if you help me in these notebooks. If you’d like to see the complete code in case there’s something wrong in the rest of the code, i created a repository with the full code.